Podcasts SDS 587: Data Engineering for Data Scientists

85 minutes
Business, Data Science

SDS 587: Data Engineering for Data Scientists

Subscribe on Apple Podcasts, Spotify, Stitcher Radio or TuneIn

Senior Data Scientist at Humu, Mark Freeman joins the show to talk all things data engineering. Mark also sheds light on what it takes to get promoted and shares his number one tip for getting hired at a fast-growing startup.

About Mark Freeman

Mark Freeman is community health advocate turned data scientist interested in the intersection of social impact, business, and technology. His life’s mission is to improve the well-being of as many people as possible through data. Mark received his M.S. from the Stanford School of Medicine where he was trained in clinical research, experimental design, and statistics with an emphasis on observational studies. He is currently a senior data scientist at Humu, where he builds data tools that drive behavior change to make work better. His core responsibilities include 1) building data products that reach Humu’s end users, 2) providing product analytics to Humu’s product team, and 3) building data infrastructure to drive data maturity.

Overview

As a Senior Data Scientist at Humu, Mark Freeman brings his infectious passion, expert knowledge, and experience to the podcast for a fun episode that explores data engineering and career guidance. Jon and Mark started the conversation with a simple question: what is data engineering? According to Mark, it involves “preparing and organizing data specifically to drive value and spans across the whole data lifecycle.”

As someone who recently transitioned into data engineering, Mark shed light on how he made the successful change. It came down to simply speaking with his manager, leaning into his talents, moving away from being task-based, and choosing projects that move the needle forward for a business metric.

And when it comes to differentiating roles between junior, senior, and staff scientists, Mark explains that the level of influence within an organization is the one determining factor. “How can you clearly identify a problem or opportunity, create a strategy around why it’s worthwhile to pursue, and then create the thought process and generate consistent buy-in.”

At Humu, Mark focuses on improving workplace behavior using machine learning. Although Python and R were the main focus of his ML learning journey, he admits that he uses SQL for much of his work. He moved over to the database management system after discovering that he was taking twice as long to complete tasks compared to his co-workers who were already using SQL.

Mark also praises R but especially for multilevel modeling, and turns to Python when operationalizing his research and analytics and integrating them within the products at Humu.

Tune in to hear more about Mark’s top data extraction, modeling, and pipeline engineering tools, his number one tip for getting hired at a fast-growing capital-backed startup, and why all data scientists should be interested in Web3.

In this episode you will learn:

How Humu leverages data and machine learning to improve workplace behaviors [10:38]
What is data engineering? [14:21]
What it takes to get promoted into more senior data science roles [20:55]
The differences between junior, senior, and staff data scientists [30:21]
Mark’s top tools for data extraction, modeling, and pipeline engineering [37:08]
Mark’s number one tip for getting hired at a fast-growing venture capital-backed startup [53:10]
Why all data scientists should be interested in Web3 [1:11:53]

Items mentioned in this podcast:

Follow Mark:

Follow Jon:

Episode Transcript

Download The Transcript

Podcast Transcript

Jon Krohn: 00:00:00

This is episode number 587 with Mark Freeman, Senior Data Scientist at Humu.

Jon Krohn: 00:00:10

Welcome to the SuperDataScience Podcast, the most listened-to podcast in the data science industry. Each week, we bring you inspiring people and ideas to help you build a successful career in data science. I’m your host, Jon Krohn. Thanks for joining me today, and now let’s make the complex simple.

Jon Krohn: 00:00:41

Welcome back to the SuperDataScience Podcast. Today on the program, we’ve got the tremendously jovial and inspiring data scientist, Mark Freeman. Mark is a Senior Data Scientist with a data engineering specialization at a Bay Area startup called Humu, that has raised a $100 million in venture capital to help people grow into better leaders, managers, and teammates, through science and machine learning. He posts original insightful tips on data science and software engineering, every weekday on LinkedIn, rapidly growing his following on the platform to 20,000 people. Previously, he worked as a data scientist at Verana Health, another well-funded Bay Area startup, and as a data analyst at the Stanford University School of Medicine. He also holds a Master’s in Community Health and Prevention Research, a statistics-heavy discipline from the Stanford Medical School.

Jon Krohn: 00:01:30

Today’s episode is geared toward listeners who are already in a technical role, such as data scientists, data engineers, ML engineers, or software engineers, as well as to folks who’d like to grow into those kinds of roles. In today’s episode, Mark details the differences between junior, senior and staff data scientists, what it takes to get promoted into more senior data science roles, how data engineering differs from data science, his top tips for data extraction modeling and pipeline engineering, his number one tip for getting hired at a fast growing venture capital backed startup. How behavioral nudges can drastically improve workplace experiences, and why all data scientists should be interested in Web3. All right, you ready for this action packed, laugh filled episode? Let’s go.

Jon Krohn: 00:02:21

Mark Freeman, welcome to the SuperDataScience Podcast. I’m so glad to have you on the program. Where in the world are you calling in from?

Mark Freeman: 00:02:29

Hey, Jon. Super excited to be here, calling from San Francisco Bay Area specifically Pacifica, which is by the coast, 15 minutes away from SF.

Jon Krohn: 00:02:44

15 minutes from San Francisco, and you can walk out onto the beach. Isn’t that right?

Mark Freeman: 00:02:49

That is correct, but with COVID, the apartment prices dropped really low, and so there was some beyond the pandemic being rough and being laid off at one point, the highlight is, I got this really nice apartment for cheap, and they gave us all these bonuses to move in because they couldn’t get anyone to move in, and been thoroughly enjoying this. But the other downside though, is that my apartment’s very small and my closets, for my office.

Jon Krohn: 00:03:22

Which actually we were talking about. So if you’re listening to the audio only version of this show, like most listeners, I encourage you to check out the closet that Mark has put a whole office in. So it’s got all of the things that an office has. It’s got his office chair, it’s got lighting, it’s got bulletin boards and calendars, and it’s even got a placard with his first check that he got, I guess for independent work? It wasn’t your first paycheck at like a regular job?

Mark Freeman: 00:03:58

Yeah. It’s like content creation, side hustle kind of thing.

Jon Krohn: 00:04:01

Yeah. And that’s for SuperDataScience, the website, www.superdatascience.com.

Mark Freeman: 00:04:11

Exactly. Full circle. It’s it was special. I think I’ve talked to other podcasts before, but I’m really in entrepreneurship. I’ve started five companies. This is my fifth one. That sounds impressive, right? No, this is the first one I’ve made money on. I’ve made a lot of mistakes, and sometimes it’s just even hard to get to the starting point because… You’ve been in startup, so you know how hard it is. I’ve tried from small side hustles to like try large scale where we interview for incubators and pitching. This is the one, the content creation’s the one that stick, that’s been working and it was completely on accident, during the LinkedIn stuff.

Jon Krohn: 00:04:49

Yeah. I mean, it certainly is working. You’ve got a ton of followers on LinkedIn. You’ve got over 19,000. Probably by the time this episode airs, you’ll be pushing 20K, and that is super impressive. The content creation is definitely working, and so that’s actually how I know you. I often tell listeners how I know our guests, and in your case, it was over a year ago, not long after I became a host of the SuperDataScience Podcast, someone at the www.superdatascience.com company, which actually is a separate entity, but we have a similar lineage. And so sometimes there’s overlap. And as regular listeners will know www.superdatascience.com is a frequent sponsor of the SuperDataScience Podcast. But somebody reached out from www.superdatascience.com and said, “We’ve got this great new instructor, Mark Freeman. He’s actually interested in being on the podcast,” and so I put you down on my short list of potential speakers, and was just waiting for the right time. And now it’s the right time.

Mark Freeman: 00:05:55

Here we go. It really just kinds of show you never know who’s watching when you’re doing your content stuff, especially… A year ago, I wasn’t even thinking of around 5K, if anything. I was still really early.

Jon Krohn: 00:06:07

Oh wow. Really?

Mark Freeman: 00:06:09

Maybe 10K. Either way, I was still very early in the content journey, and I think I was telling you earlier, many times when I have people reach out to collaborate for content, for fun data science things, they’ll be like, “Yeah, I’ve been watching for a while, and they’ve never engaged with my content. They show up one day and they’ve been watching for a while. So just show up for yourself and soon enough other people will show up for you too.

Jon Krohn: 00:06:39

Yeah, that’s a really good point. If you think about, so when you make a post on LinkedIn, and actually, I bet most listeners haven’t made a post on LinkedIn because I can’t remember the exact stat, but it recently came across my screen, that is a really small percentage of people that make LinkedIn posts at all. So maybe you haven’t, but if you do, then you can see how many impressions the post has had in real time. You compare that with the people who have reacted or commented, and it’s a small fraction. It can be like, some post it’s like 100:1, the number of views relative to reactions. And so yeah, there definitely are a lot of people out there just watching. Creeping, and sometimes they’re people you know really well. Like I’ll be at a dinner party or something, and somebody will be talking about the specifics of some posts and like, “Oh yeah, I read everything you write,” and I’m like, “You do? You have never reacted, ever.”

Mark Freeman: 00:07:39

That happens at my job all the time. They’re like, “I really liked your post.” I’m like, “I didn’t even know you saw this.”

Jon Krohn: 00:07:45

You really liked it, but you didn’t actually like it. Come on man.

Mark Freeman: 00:07:53

Yeah. I create content for everyone, even if you like, or comment, which is preferred, but either way of creating content to teach people and share ideas.

Jon Krohn: 00:08:03

I am only creating content for reactors.

Mark Freeman: 00:08:07

Yeah.

Jon Krohn: 00:08:07

Everyone else, go away. I don’t even want your views. No, please. I want your views. I’m so needy. So tell us a little bit about that. Tell us a bit about this content creation, like what kind of content are you creating that’s attracting so many followers? How’d you get started on that?

Mark Freeman: 00:08:27

So as I mentioned earlier, during the pandemic, I was laid off about two years ago, and that sounds really sad, but it ended up being one of the best things ever to happen. I kind of had this wild idea of, applying to data science jobs is not a fun process. You go send your application out to the void and you get no response. So I was like, “You know what, let’s flip this.”

Jon Krohn: 00:08:49

Especially at that time.

Mark Freeman: 00:08:51

Yeah. I was like, “I’m going to have jobs apply to me.” And so instead of me going out, I was like, I’m going to create one piece of LinkedIn content that provides value to at least one person, every single day, and I created this hashtag called the layoff hustle, where I documented from day one I was laid off, to the day I got my next job, to show people how they can do it. My thought process was using the AIDA model, which is sales funnel; Awareness, Interest, Desire, Action. Awareness is that content piece, make people aware that you’re looking for a job and you have these data science skills. And then the Interest, Desire, Action is like the interview steps. But that strategy worked and it made me see the power of LinkedIn, and that encouraged to keep on posting. So I started posting maybe like once a week, three times a week, and now I post Monday through Friday. And it’s been a really wild ride.

Mark Freeman: 00:09:46

What happens is, because you’re posting and people are aware of you all the time, they’re like, “Oh, Mark could be so cool to collaborate with,” and so some of my favorite projects are people come up to me like, “Hey, we want to do X, Y, Z. Can you use our product to…” Their open source tool or something like that, to create a fun project. And because I love data science, I’m already doing these fun projects for fun, already. So I’m like, “Yeah, of course.” Cool, I get some cool data to work with or teach people I couldn’t reach before. And it turns into these really cool opportunities and it’s been running with it.

Jon Krohn: 00:10:24

Cool. Well, congrats. I love that you started doing it, that you had this, not just silver lining to being laid off, but actually something that’s accelerated your career, not only as a content creator, but also as a data scientist, to wit, you’re working at a company called Humu now. So Humu is a startup that combines science and machine learning, to enable people to grow into better leaders, better managers or better teammates. How does the Humu technology work, and what’s your role there?

Mark Freeman: 00:10:55

Definitely, and really quick to tie it back to the content creation; if you’re on the fence about just posting on LinkedIn, do it, because the reason why I’m able to grow so much in my current job at Humu, is because I’ve built up this network of basically… I call my followers, basically mentors. I have 20,000 mentors basically, teaching me all the things they’ve learned. I go talk to them all the time, whether in LinkedIn chats or calls or podcasts, and I bring it back to my job to solve some really fun problems. And so the problems we try to solve at Humu, basically the way it was pitched to me two years ago, was building AI to make people happy, which is a great thing to feel great about. But the reality of it is, essentially just diving in, there’s these things called nudges, and lot of behavioral economics around it essentially is that there are many interventions to push people towards better behaviors.

Mark Freeman: 00:11:52

One of the best places for behavior changes in the workplace, people are at work all the time, the main institutional thing. And so the question then is like, how can we push people to enjoy their work? Whether it’s creating a more inclusive work environment, having a manager that’s phenomenal, right? Giving them the data to understand what to do, and then tying that information into actual insights or actionable advice through our nudges, that appears there via email, slack, or Microsoft Teams, wherever your platform is. So it’s a really cool data problem because you have a whole list of advice, a whole list of people with different attributes. How do you match the right advice at the right time to the right person? It’s a super fun data problem to work on. And so my role, being in a startup, when I joined, the company was around 50 people, now around like 130, 150, so it’s like completely different. Our data science team was small and mighty, so I’ve done a little bit of everything from building NLP models, let me phrase that, not building a full NLP model, but taking existing NLP models and putting into production for our product, to doing product analytics, understand which product direction we go. And now, I’ve just been diving into data engineering as a data scientist, and it’s been super fun.

Jon Krohn: 00:13:18

This episode is brought to you by SuperDataScience, our online membership platform for learning data science at any level. Yes, the platform is called SuperDataScience. It’s the namesake of this very podcast. In the platform, you’ll discover all of our 50+ courses, which together provide over 300 hours of content, with new courses being added on average once per month. All of that, and more you get as part of your membership at SuperDataScience. So don’t hold off. Sign up today at www.www.superdatascience.com. Secure your membership and take your data science skills to the next level.

Jon Krohn: 00:13:55

Yeah, so you do work as a senior data scientist with… at least on LinkedIn what it says is, Senior Data Scientist (Data Engineering). You’ve been doing that since April, and prior to that, you spent six months as a senior data scientist, without the data engineering in bracket. So for our listeners who don’t know, what is data engineering and how did your role change there when you had the parentheses added on?

Mark Freeman: 00:14:27

Definitely. There wasn’t a really like official title change. I just talked to my manager. I was like, “Hey, I’ve been doing nothing but data engineering for the past six months, let’s just change the title, because it just makes it easier for me when I’m creating content or trying to network with people.” It just makes it easier for me to have that common language and talk about things, because me going to some InMail for a data engineer to be like, “Why is this data scientist talking to me?” Even though most data people are nice, but this helps kind of provide contextualization of the work I’m doing as a data scientist. Can you repeat your question again? I lost track of it.

Jon Krohn: 00:15:10

Yeah.

Mark Freeman: 00:15:11

How is my role different from data science to this?

Jon Krohn: 00:15:16

Yeah, now that you have the parenthesis.

Mark Freeman: 00:15:18

Yeah. Working startups titles are kind of like wishy washy in startups; you just show up where you provide value. It just so happens that data engineering has been that. So I guess kind of a story of like, how do I even start going towards data engineering?

Jon Krohn: 00:15:34

Yeah, and what is data engineering?

Mark Freeman: 00:15:36

Yeah, so what is data engineering? That is a great question, and I am still trying to learn that as well, because I’m a data scientist coming into data engineering. And so I don’t want to talk as if I know all these answers for this new space, I’m still learning, but I would argue data engineering is the process of preparing data and organizing data within an organization, specifically to drive value, whereas a product or internal metrics. And so data engineering kind of expands across the whole data life cycle, which is kind of overwhelming how much they touch. But I mean, they’re the ones really thinking critically of how should data flow within our system reliably, and with scalability in mind. And so I would love to plug Ternary Data, Joe Reis, Matt Housley, because they have been really integral with both their content, and also just talking one-on-one on LinkedIn, to really help me transition, and answer a lot of my questions early in my journey. So I would definitely refer to them if you want to real definition, because they’ve really thought about that, but that’s how I really view it.

Jon Krohn: 00:16:46

We might have to see if they want to be on the show.

Mark Freeman: 00:16:50

Man. That would be awesome. You should definitely reach out to them, and I’m happy to make an introduction if you don’t know them already.

Jon Krohn: 00:16:56

Look out listener for a future episode with Joe Reis and Matt Housley. They’ve been on my radar for a while, so yeah, maybe we can-

Mark Freeman: 00:17:03

Yeah. They just have the book coming out for O’Reilly the data engineering book. So out of all people, they would know the definition pretty well.

Jon Krohn: 00:17:11

Yeah. I mean, it sounds like a perfect episode.

Mark Freeman: 00:17:18

I think it is. I would love to listen to that, but for this episode, it’s unfortunate now, it’s amazing as them, but essentially, kind of how I got into that is if you ever worked in a startup, many times, you just have to wear the hats, many different hats, just to get things done. You’re building the infrastructure as you go. Things aren’t clear of like what’s exactly needed, and so it’s alive, iterating really fast. So in my role, and this is about a year and a half ago, what kicked off me being obsessed with data engineering, I was asked to help the VP of Product, understand what’s happening with one of our product surfaces. Like why is it performing the way it is? It was supposed to be a really easy question to answer, it was scoped for five hours.

Mark Freeman: 00:18:06

It took me 20 hours to source the data, to identify what was the correct business logic, how it flowed through a system, and then prepare the data before I even did the analysis. Then I also realized all my colleagues were doing this for every single analysis, every single time. I’m like, “This is horrible,” because we’re all doing different logic. We’re all doing different kind of approaches, and it’s just mind numbingly hard, even though data cleaning’s a large part of data science, the cleaning wasn’t the hard part. It was actually understanding where in the code base did this data derive from, and how was it transformed in our product, so that way our analysis matches what’s seen in the product. For me, I was like, “I want to fix this. This shouldn’t happen again.” And that’s how I kind of got into it. I started really going around the company asking like, “How do use data? What’s working, what’s not working? What do you wish you have?” We’re a 50 person company at that time, so it was easy to ask the whole company, and do these little roadshow interviews. I learned a lot.

Mark Freeman: 00:19:11

From there, started really diving into our data warehouse, trying to think critically of like how can we improve data access throughout the entire organization? How can we create data models that are easier for data scientists to answer the questions they’re asked? And more importantly is how can I make it possible for people to self-service their data needs, outside of data science? And through that, I started creating all these projects. I started getting traction and I wanted to go more further and further upstream of like, “All right, how’s this data sourced? How can I create my own ETL pipelines? How can I fix ETL pipelines? How can I add data sources that’s missing?” And I just haven’t been able to stop since then. I learned so much. Some people call me a data engineer with the company, but I’m not necessarily ready to take on that title because I still-

Jon Krohn: 00:20:03

Maybe someday.

Mark Freeman: 00:20:04

Maybe someday, but I feel like I have so much to learn.

Jon Krohn: 00:20:07

You can have an intermediate step where you’re a Data Engineer (Senior Data Scientist).

Mark Freeman: 00:20:14

Exactly. But you know, the core kind of theme through all this is that I have an amazing manager who saw me want to go into data engineering and she just kept on teeing up opportunities. Or when I asked she was like, “Yeah, go for it,” and making it very clear where the business impact would be at, so I can both learn, but also still drive impact at the same time.

Jon Krohn: 00:20:35

So cool. All right. So that gives us a great sense of how you got obsessed with data engineering and now have more and more of it in your role as a senior data scientist. But tell us about the transition before that, from just… It doesn’t say Junior Data Scientist, but just regular common Data Scientist to now a Senior Data Scientist. So you were at Humu for a bit over a year as a Data Scientist before you were promoted to a Senior Data Scientist. So is there at Humu, a strict definition of what the difference is between those two levels of seniority, and how did you pull it off? What was involved in getting that promotion?

Mark Freeman: 00:21:12

Definitely. So another quirk of startups is that they’re typically very flat. They don’t try to implement levels too quickly, at least for the startups I’ve been at, but when I joined, that was when they were… I think that was year three or four of the company, and they were like, “All right, we’re at a point where we need to implement levels,” so I just had correct timing where senior data scientist made sense. One thing about working on Humu, I work with like behavioral economics kind of thing, and making work better. There’s a field called I/O Psychology, and a lot of my colleagues-

Jon Krohn: 00:21:51

Industrial-Organizational Psychology.

Mark Freeman: 00:21:53

Yes.

Jon Krohn: 00:21:54

Not input-output psychology.

Mark Freeman: 00:21:58

So the data science team fits within the people science team, which is like the I/O psychologist. And so, because we have all this amazing expertise, so like PhDs and Masters, who have devoted their life to understanding workforces, they also create our levels. They also create our interview processes, and so my manager, and I’ve been asking her to share more publicly. One day I’ll convince her. She created this whole leveling system for different tracks on the data science team, whether it’s, what’s the qualities of a Junior Data Scientist versus a regular Data Scientist, versus a Senior Staff. What does this track look like if you want to be IC in analytics, an IC in product, or maybe you want to go down the manager path. [inaudible 00:22:49] Yes. Sorry for adding all the acronyms.

Jon Krohn: 00:22:54

That’s okay. That’s what I’m here for.

Mark Freeman: 00:22:54

Thanks for clarifying, but the amount of thought she put into it and she did a whole literature review on what this means and talked to other leaders. There is a very big distinction, at least at our company, but that may not be the case for every single company. I think we’re just at a unique environment where we just have the expertise to really think about this properly, and so they’re going to leverage that. My manager was able to very clearly articulate what were the things I was missing to make the jump to Senior Data Scientist, and-

Jon Krohn: 00:23:27

Such as?

Mark Freeman: 00:23:29

Such as… I think the key thing was moving away from being task-based to actually coming up and organize your own projects.

Jon Krohn: 00:23:39

Yeah.

Mark Freeman: 00:23:40

Creating a plan, deliver on projects, but more importantly, choosing projects that moves the needle forward for a business metric. That’s the key difference. It was a strange transition because when I first started my career, you’re given a ticket and your work output is basically defined by your manager, whoever’s in charge of triaging the tickets. The more work you do the better. But when you’re trying to make that jump to actually being on a senior level, the more work you do is actually bad, because you’re just going to be burned out because… Here me out here me out… Because here’s the thing, once you get to the senior level, you have the option to work on anything, especially as a startup. So you can be working on a lot of stuff and burnout. You can be working a lot of things that are the wrong thing to focus on. So you’ll always be busy.

Jon Krohn: 00:24:36

I thought you were going to say, “Now that I’m a Senior Data Scientist, I get to spend all my time on the beach, that’s just outside my apartment.”

Mark Freeman: 00:24:41

Yep. Resting backs all day. For the record, I’m not resting, investing Humu.

Jon Krohn: 00:24:46

No, he’s in the closet all day toiling away.

Mark Freeman: 00:24:53

But that was a key distinction. So when I was trying to make that jump, I was really tired and almost burning out because I was trying to do everything. And my manager was able to say like, “Hey Mark, you’re doing a lot, which is great. But you need to prioritize better. You need to understand what’s going to be the high ticket things, and know when to take something off.” So I really worked on that skill with her. As I said earlier, one of that big projects was increasing data access within the company. So that was the project where I was able to identify opportunity that drove the needle forward, was a strategic initiative and was something I kind of scoped out myself and was leading the charge on. And more specifically, thinking about the strategy side of it, and I would love to give a big shout out to Vin Vashishta, who does this Business Strategy For Data Science course. Humu paid for me to do that, and it completely changed how I approach data science. I think that was a really key moment for me, to shifting from that junior to senior level. Because from that-

Jon Krohn: 00:25:59

What’s that course called again?

Mark Freeman: 00:26:03

It was Business Strategy For Data Scientists, I believe. And I can find the exact title and send it to you.

Jon Krohn: 00:26:08

Yeah, that’d be great. We’ll pin it in the show notes.

Mark Freeman: 00:26:10

Yeah. It was really transformational for me. What that did for me was like, “Where would data science drive the most impact for a business?” And that shift allowed me to think more like a senior data scientist. And so before I would be like, “Well, I just need to do a whole bunch of analyses. Just answer a bunch of questions for the sake of doing data, like build a model or like put out a bunch of PRs.” After that course, I was like, “I’m doing this wrong.” I took a step back, and I thought, where in our value stream, kind of like how we’re providing value to our customers, from our products to our customers. And customers are the one that’s paying, we’re doing enterprise kind of sales for these things. Where is money coming in? And then from there, in that process, what’s really high touch, that’s really hard to do, that data and AI can’t touch? And for me, I was like, “Customer success.” They’re the individuals after sales, taking a customer and guiding them along the journey, to make sure they’re happy, but also they resolve issues and they’re in charge of something very important, preventing churn, and adding expansions for it.

Mark Freeman: 00:27:25

And so the other option was sales, but as a data scientist for sales, the customer is before they come in, so there’s less data on them. But customer success are existing customers, so we have all their sales data plus the customer success data and all their product data. So there’s more data, they’re in charge of this business metric of expansions and churns, and they’re doing high touch customer engagements. Therefore, through Vin’s class, I thought I need to empower customer success to be rockstars through data, and their meetings. How can I do that? What are the questions they’re trying to answer for data that they can’t right now, and how can I automate as much as possible with data, just to make them show up to every meeting and seem so informed. That was the key project that really pushed me to the senior data science level, and it was a huge shift in how I thought about approaching data science to drive impact.

Jon Krohn: 00:28:20

Amazing. That was such an excellent specific example of what it took you to transition from a common Data Scientist to a Senior Data Scientist. I think you did touch on a lot of the key things that I look for in similar kinds of leveling exercises, with people that work for me, that specific thing that you mentioned at the beginning of your explanation about going from being somebody that takes tickets. So for those of you who aren’t software engineers, or aren’t involved in the tech space, there’s this typical process within tech companies where you try to get the work that needs to be done onto these discrete tickets that indicate exactly what work needs to be done, to make some change to the software code base, or to develop some new machine learning model. And the more discrete you could make those tickets the better.

Jon Krohn: 00:29:17

You want to have an estimate of how much time you think it’s going to take, and mark actually already alluded to that earlier with an example where he was talking about some ticket that was scoped for five hours, that took him 20. So obviously, the better your estimates of how complex the ticket is, the more easily you can project how long, some big project deliverable or some product feature that requires lots of tickets, is going to take. Also, there’s another term. You used the term PR, so same thing, if you’re not a software engineer already, that means Pull Request. And so this means that you’ve requested to integrate code that you’ve written with the code that everyone else has written. So the changes that you’ve made, to have that approved. Go ahead.

Mark Freeman: 00:30:13

I was about saying, something I would like to add, because I’m still on this learning journey. And again, my manager’s already created these levels, all the way up to past staff. And so now I’m looking at, “All right, what do I need to do to go to the next level? Do I want to go to the next level?” My manager’s in a lot of meetings. “Do I want to have as many meetings as her?” And one thing I’m learning is, and I don’t think I’m trying to gun for a promotion for this next cycle. I’m really enjoying the senior space. But one of the key things that she’s been teaching me a lot, is that difference between a senior and a staff, I think it really comes down to influence within an organization. How can you clearly identify a problem or opportunity, create a strategy around why it’s worthwhile to pursue, and then create the technical specs, create the whole kind of thought process and consistence, and buy-in across the organization. Because when you’re at this level, you’re not working on little small tickets, you’re working on, like, “I want to change the infrastructure of our tech stack,” and the implications of that are huge, both good and bad.

Mark Freeman: 00:31:15

So getting that buy-in’s a very long process. Requires a lot of meetings, a lot of listening, a lot of feedback, give and take, and a lot of writing documents. Less coding and more thinking about the coding, and I think that’s the big difference I’m seeing at that staff level of being able to really do that, and then delegate it to other people, not as a manager, but as a technical lead.

Jon Krohn: 00:31:43

Nice. That’s a really good explanation of the differences between senior and staff. Really cool that level of influence, and kind of the size of these big transformative projects that the staff Data Scientist is taking on.

Mark Freeman: 00:31:56

I’m learning it now, it’s hard. It’s so hard. My manager’s really effective at it, and so I just watch how she navigates on Slack, and I’m just mind blown. “Wow. You’re really crushing it.” I feel really lucky I get to learn from someone so talented.

Jon Krohn: 00:32:10

She sounds amazing. What’s her name?

Mark Freeman: 00:32:12

Stefanie Tignor.

Jon Krohn: 00:32:13

All right.

Mark Freeman: 00:32:14

She is phenomenal. She’s the Head of Data Science and Insights at Humu. She has a PhD, I believe in Bio-Psychology, sorry, Stefanie, if I got that wrong. Somewhere in psychology, but she’s super talented, and is one of those individuals, who’s both a phenomenal leader, but also very technical, where she built an ML model last week for a POC. She can dig into our code and be like, “Change this, do this, have you considered this?” If you haven’t worked in data science or a technical role before, not all your managers will be technical and there’s pros and cons to that for sure, but definitely a huge plus of having a technical manager is being able to think critically about your work and collaborate.

Jon Krohn: 00:33:03

Sweet. So in summary, data scientists are task based. They get given tickets by their managers and execute on them. Senior data scientists come up with their own work that has commercial impact, and then staff data scientists are doing big, game changing, collaborative work across the organization. That could mean huge platform changes, huge model and approach changes, data flow changes that are going to have a big impact hopefully for the better. Super cool.

Mark Freeman: 00:33:34

Definitely. That’s definitely a generalization within different companies, as you know. It can be completely different for titles and levels across organizations.

Jon Krohn: 00:33:45

Yeah, I think those is a general idea. I think that those are great. I agree with them, the way that we’ve defined them.

Mark Freeman: 00:33:51

I’m still learning myself, so it’s cool to hear other people kind of seeing this as well.

Jon Krohn: 00:33:57

Yeah. All right. So we have a sense of what Humu does. They’re nudging people into the direction of having better workplace behaviors. Do you have a couple of examples of use cases where data science has an impact on that Humu mission?

Mark Freeman: 00:34:12

Definitely. I think one of the key things is that we’re taking science, these best practices. We have all these amazing smart psychologists, who know the theory really well, and we’re applying it in a product form, and scaling that ability up. I really feel like I’m getting to impact people’s lives that are positive. So it’s really cool to show up to work and be like, I’m making a difference, make the world better. One of the key things though, because at end of the day, we’re still a business, people want to know, does it work? Here’s the science, this theory, does it work? And time and time again, what’s really cool with data science is that with our customers, we are able to take our product data, we’re able to take… Sometimes the customers might ask for specific analytics, but provide their business outcome data.

Mark Freeman: 00:35:05

And we’re able to compare the impact of nudges and our product on the outcomes of employees and managers and their effectiveness on various business metrics. It’s so cool. It’s super fun to see kind of the analyses they do. Unfortunately I can’t go into detail about those things, but the ability of data science to really paint those clear stories through analyses and partner with customer success, partner with sales and marketing to really get these really effective data stories out there. I haven’t necessarily been working on this, this much, all my colleagues who are amazing, do this, because I’ve been focused on data engineering. I’m trying to get the data for them to do these cool things. But I see in our meetings, they talk about it. I’m just always amazed by what they do, the research methodology they use. They’re doing some really cool statistics.

Mark Freeman: 00:35:58

And then through that, we’re able to pick up these patterns and we’re like, “All right, well how do we build more product around that? We’re seeing this pattern happen, how do we operationalize that? How do we build…” I’m just blanking right now, but I’m trying to think, what can I share? There’s a blog on the NLP model, how can we best identify for when people take surveys for employees. Say for instance, you have 1,000 people, you’re not going to read through a 1,000 comments. How do you determine which comments are meaningful for understanding the pulse of your company? How do you determine which comments are negative or positive for a certain aspect that we told you, that’s changed recently, for your wellness score or something like that, and it’s making it work out, but you get the idea. So there’s a really cool data problem that you can work with and being at a startup, you get to allowed to iterate really quickly on those. Ask a lot of questions, see what sticks, find that product market fit. And then from there, really build cool applications.

Jon Krohn: 00:37:05

That is cool, Mark. That was a great example. So whether you’re wearing your Senior Data Scientist hat or your (Data Engineer) hat, what are the kinds of tools and techniques that you use day-to-day to make the magic happen?

Mark Freeman: 00:37:20

You know what’s funny is that when I learned data science, so much emphasis was on ML, R and Python, choose your poison. But essentially, SQL is where it’s at. SQL, I use all the time. I don’t know if it’s SQL, S-Q-L, I call it SQL. But I feel like it can be a hot, hot topic debate right there if I wanted to. I have a little flame war on LinkedIn, but essentially, I use SQL so much in my job. This actually came from my first data science role when I was at Verana Health. So before Humu, what I found was, we were working with massive data sets, billions of records were using Spark for it. And so I’m excited. I’m a new data scientist. I want to use Spark. I want to use Python, and I’m doing my analyses and it’s taking me twice as long to do all my work, compared to my colleagues. I’m like, “What’s up, what’s happening?” I feel like I’m writing pretty solid code.

Mark Freeman: 00:38:28

Well, while I was waiting for my Spark and EC2 instance to try and chug along, they were all using SQL and Athena, and working two times as fast because their code just… They spit out the answers right away instead waiting for their code. That was the moment I was like, “I need to learn SQL. I need to get good at this, because if I can do as much work as possible in SQL and just preparing my data and getting my data out there, and then going into environment, whether it be Python or R or wherever it maybe, it’s going to be really powerful.” And so going back to Humu, we use BigQuery, we’re a GCP company and I absolutely love BigQuery. I think it’s one of the coolest tools ever.

Jon Krohn: 00:39:12

Yeah. We use it too. I love it.

Mark Freeman: 00:39:15

Yeah. It’s so powerful what you can do with it. And here’s the thing is that they literally have thousands of people and billions of dollars being pumped into this, to make it run well. Whatever they created, will outperform whatever script I create in 30 minutes. And so that’s why I go to SQL a lot, and these managed services for that. So I use SQL a lot for analyses-

Jon Krohn: 00:39:44

Just to quickly dig into a couple of ideas there before you move on to the next one is that, so with Google BigQuery, in case this wasn’t obvious to listeners, it is a SQL-like language. So the syntax is very similar. If you already know SQL, doing Google BigQueries will be very easy for you, but the managed service that they provide, allows you to use those SQL-like queries on massive databases really efficiently. So we use Google BigQuery as well, and I love it. We’re also a GCP company, so that’s going to be my bias. Just so people know that is the cloud provider that I use the most and that we made the decision to use, a while ago. So I support that. You also mentioned when you were talking about in your previous company, when you were talking about your colleagues using SQL, you talked about them using something called Athena. That isn’t something I’ve heard of before. So that’s something that allows SQL queries to be scaled up to very large data sets?

Mark Freeman: 00:40:40

So it’s been a while, and please double check listener, but to my understanding, Athena allows you… So there’s Athena and there’s Redshift. These are AWS kind of services. RedShift’s a typical data warehouse. To my understanding, Athena allows you to query directly on top of your data lake, and so to reference that rather than going directly into the data warehouse. That is my understanding from three years ago.

Jon Krohn: 00:41:10

Cool. All right. That sounds great. All right. So I interrupted you; were talking about data retrieval and then you were going to move on to analysis, I think.

Mark Freeman: 00:41:18

So essentially, we get this data ready in SQL. We create all these kind of data assets using SQL. From there, I bring it into either R. So we use R for a lot of our analytics. Honestly, I think R is probably one of the best for statistics, and the type of analyses we’re doing. We’re doing a lot of observational studies, especially like multilevel modeling, R is going to be a piece for that-

Jon Krohn: 00:41:45

Oh yeah. R is the best for multilevel modeling..

Mark Freeman: 00:41:47

Yeah. If you think about a reason why we’re doing multilevel modeling is, you’re working about workforces. You have different departments, different teams, or different locations. Multi-level modeling really handles that very well.

Jon Krohn: 00:41:59

Yeah. It’s a technique that we have talked about recently on the show. So we talked about it in last week’s guest episode, in episode 585 with Thomas Wiecki. In that episode, we were talking about using Bayesian statistics with the pimc library to have hierarchical models but if you’re not doing Bayesian statistics, then certainly R is, and maybe even if you are doing Bayesian statistics, which is relatively rare, R is the place to do it. That is how I really got into R in the first place, because prior to R, I was using MATLAB mostly for my analysis and it was Andrew Gelman and Jennifer Hill’s book called something like Multi-level/Hierarchical Regression Models in R and-

Mark Freeman: 00:43:01

Yeah, there’s a whole bunch of names for it.

Jon Krohn: 00:43:04

So yeah, you know that book. It’s like the Bible in this space. They are hierarchical modeling space. I’m super excited at the time of recording, so by the time that you get to listen to this listener, I will have had the opportunity to meet both Andrew Gelman and Jennifer Hill in person at a conference, New York R Conference, which at the time of recording is in the future but at the time of listening is in the past.

Mark Freeman: 00:43:32

Amazing. And also for the listeners, if you’re like, “Multi-level modeling? What is that? Why would I use that?” I think multi-level models, one, they are still confusing to me because they’re one of those things where you have to keep on trying over and over again. It’s not like riding a bike. I feel like I forget it if I don’t use it after a while so I have to review the literature because I’m in data engineering world now. I’m not doing the analytics as much. But it is one of my favorite models because I think it’s a great way to represent Simpson’s paradox where Simpson’s paradox is the thought where you may think you have one trend going a certain direction, but when you stratify into the different groups, it’s actually the trend’s going on the opposite direction. And so multi-level models specifically are really strong in handling that, so when that clicked for me in grad school, when I was first learning about this, I was just like, “This is the coolest thing ever.”

Jon Krohn: 00:44:34

Yeah, really great example there of how it’s so powerful. To kind of give a concrete example, a common thing that Andrew Gelman and Jennifer Hill do in their book, which for sure we’ll have a link to in the show notes is they do a lot of political research and so you can, with a hierarchical model, you can break up your data into groups like regions. So it could be school districts, or it could be voting districts and then you can have every single school district or every single voting district have their own little regression model, their own weights within this broader model and the broader model can pool together the individual ones and so it allows you to have this much more nuanced representation of what’s going on that as you point out Mark, can allow you, for example, to avoid making the Simpson’s paradox mistake. It provides a lot more nuance and specificity to the model so it means then that you can make predictions by school district or by voting district in that model. You don’t have to just have your, your aggregate country level model, and so it has that extra power.

Mark Freeman: 00:46:07

Definitely. And I would give advice to any data scientist who wants to more move into the analytics side of things, I would highly encourage learning about observational studies and the techniques used for that space and the reason being is that, and this is more so going to my healthcare research background, but either have clinical trials or randomized control trials where you control the experiment, you control the data and therefore you can control for a lot of the bias. And there’s various steps for that, but many times as data scientists, we don’t control the data collection. Maybe we’re doing AB testing, there’s a little bit of control there, but many times you’re getting data from a product that’s intended for something else, or you’re getting a third-party data that’s coming in to combine it together. So the secondary analysis that you’re doing, observational studies are the way for me to conceptualize and think through how to approach this, and more importantly, reduce bias as much as possible, to do causal inference.

Jon Krohn: 00:47:09

Cool. Yeah, that was just a really good tip, and I do also recommend understanding those kinds of concepts well, when you can be making causal inferences or not. Great point. All right, cool. So we’ve got a good sense of what Humu does, what you do there, yeah.

Mark Freeman: 00:47:26

Oh, going back just to take a step back to, in addition, that was the R side. I also use a lot of Python. I probably use Python more than R for my work because I’m more on the engineering side, and so if I want to take the research or analytics and operationalize it and put it within our product, we’re a Python-based company. And so many times I won’t be working with BigQuery to access the data. I’m actually accessing our data directly within the database and from there building the various tools and I get to do the software engineering best practices. Doing the unit test, having classes, making things modular, getting the code review thoroughly from the engineering team and that’s where I have the most fun at. I really enjoy that.

Jon Krohn: 00:48:16

Nice. It does sound like you are becoming more and more of a software engineer for sure and so I understand that right now Humu is doing software engineer hiring and yeah, this is… I’ve said this a dozen times on the Super Data Science Podcast before, but if you want to be super employable as any kind of role in data science; data analyst, data scientist, if you want to be super employable, definitely that list of things that Mark just listed around unit tests, it was perfect. You should rewind and listen to it again. If you want to be super employable, listen to that list and be making steps in the direction of being a software engineer, which could include being a data engineer or machine learning engineer and you will have so many job opportunities as opposed to being just a pure data scientist who can only work once the data have already been provided to them and being able to take your machine learning model and put that into production, or being able to engineer the data flows into your machine learning model that kind of defines machine learning, engineering and data engineering, respectively.

Jon Krohn: 00:49:31

Those are hugely valuable skills and then those can be a stepping stone to being a backend software engineer, full stack developer and you can still use the data science background that you have in lots of ways. You’ll be a full stack data scientist. Some people call that, but yeah, there’s just… We hear about this all the time on the show. Often our guests are doing data scientist hiring, but they are always hiring software engineers.

Mark Freeman: 00:50:01

Yeah. It’s a hard role to fill. And also just some quick advice for people, how I learned this stuff, when I came into this job, I had desire to do more software engineering stuff. I didn’t have the skill set for it. And so my manager put me in a position to do these more engineering tasks and allowed me to stretch myself and code reviews is where I learned how to do these software engineering best practices. I had really patient and really kind and thoughtful mentors on the engineering side. Shout out Miriam. She is amazing. She brought me a lot of code review on large projects to really essentially get me up to speed to write production level code and code reviews, both doing code reviews and getting code review.

Mark Freeman: 00:50:43

So you may be asking yourself, “I don’t have this job yet. How do I even get this practice?” I would learn Git. Learn Git, learn how to create a poll request on your own GitHub, and then go on LinkedIn, go into your network and be like, “Hey, you’re a software engineer. You’re a data engineer. I did this code. Can you do code review for me?” Ask someone and you’ll find at least someone to say, yes, provide feedback on your code and then implement it. And so the next time you go into a job interview and they’re like, “What’s your experience?” “I did this whole side project. Oh, by the way, I even got code review for my network and implemented their feedback.” You’re going to look amazing to the hiring panel to show that type of initiative and that’ll really mimic the workflow.

Jon Krohn: 00:51:29

Super cool advice. Really great advice. So to get to the point that I was starting to make, at Humu, you’re hiring software engineers right now.

Mark Freeman: 00:51:38

Yes.

Jon Krohn: 00:51:40

I know that you have… Obviously you’re a data scientist who’s been hired there, other data scientists have been hired there. Currently at the time of recording there aren’t data scientist openings, but there are software engineering openings. Do you want to tell us about what kinds of roles those are?

Mark Freeman: 00:51:55

Yeah. We have front end, back end roles, and then we have data infrastructure roles as well, and all levels of experience as well. I recently conducted cross-team interviews and we have a role for zero to two years of experience. We really mean zero. Fresh out of bootcamp kind of thing, but also we have more senior roles as well for that. I was telling you earlier, I think, I honestly think we have one of the best engineering teams in the world. I may be biased because they just helped me out so much, but the amount of talent that I see… They have 20 years of experience. Our head of engineering, Sophie Alpert was on the React team for Facebook and so they do a lot of React stuff. She’s super talented. She solves all my problems. There’s so many other people just like her on the team, but most importantly, they’re willing to share their advice and really mentor you and help you grow and that’s the key part that’s really important to me is the amount of willingness they’re able to really help you out in your career, and I really appreciate them for that.

Jon Krohn: 00:53:09

Cool. All right. In the hiring that you guys do at Humu, I know that you often have input on who gets hired. What are the kinds of things that you look for in people that you recommend?

Mark Freeman: 00:53:22

Definitely. I think the key thing, if I don’t know you and you reach out to me asking for a job, I’m going to ignore it, because you didn’t give me any information about what you can do, what was your interest in the role, just asking for a job. I see that often and I think it’s more so just a lack of knowledge. They don’t know how to start the conversation per se. I would argue, do your research on what roles exist, do your research on what problems the company’s trying to solve and then when you message people say like, “Hey, I see this exact role. I’m interested in this,” or maybe your role doesn’t exist, but you say, “I see this company doing these problems. Here’s my previous experience. Here is how I can provide value.” And most importantly, if I don’t know you, if you provide your GitHub with a portfolio project, I will open it and I will review it.

Mark Freeman: 00:54:13

And many times when it’s good, I don’t care what your background is, if you can have a solid portfolio project, I will message that directly to my manager. I’m like, “You have to check out this person. Here’s his project. Here’s why it’s awesome.” So those are the key things. That’s a very high-level thing, but I’m happy to go into any other kind of specifics on the… Because I’ve done on the hiring process from technical screens to the case study reviews, to actual one-on one reviews.

Jon Krohn: 00:54:40

I mean, I think that’s a great answer. We still have tons of other questions that we’d like to cover, so I think that is your high level tip for people if they’re interested in applying to work at Humu or elsewhere, having a strong GitHub portfolio that you can point to, definitely a great tip mark. All right. We’ve talked a lot about Humu and what you do there. Before Humu, you worked in three Bay Area, health and wellness related startups. Verana Health, you already mentioned them earlier and two other ones, Collective Health and Life Dojo. What have you learned from startups that you couldn’t have learned anywhere else?

Mark Freeman: 00:55:17

Man, that’s a really good question. I couldn’t learn anywhere else. I think two things that really pop out to me, one, is prioritization. Of course you’re going to have to prioritize for any job, but you really feel it in startups because it can literally change by the day. I’ve literally had projects where we’ve built all the requirements, we’re ready to go, and the day of we’re like, “Actually this new opportunity popped up and it’s very important, so this project’s completely done. We’re going to put on ice for a while,” and it’s not like till a year later I can bring it back up. So that happens and being okay with having a queue of tasks, so going from that task base to impact base, I have hope some [inaudible 00:56:02] tickets are months old and that’s okay because they’re in a queue to be something worthwhile to do, but not right now. And being in communication with your manager or your team, I’m like, “What’s the priority? What’s going to drive the needle forward?” and having that understanding of prioritization.

Mark Freeman: 00:56:18

People I would highly recommend talking to are product managers. Go find a product and learn how they are prioritizing product features, how they’re thinking about opportunities and thinking about the various tasks and keeping things on track. I think product managers are masters at prioritization, so I reach out to a whole bunch and be like, “How do you prioritize this? Because I suck at this and I want to get better.” And then the second thing, and I think this maybe might be an early career mistake, but I would look at tech stacks or see a problem and I would think to myself, “Why would someone code it this way? This makes no sense. Why would they do this?” And now that I’ve been here enough in startups, I know now I’m probably doing the same thing and someone’s probably saying the same thing about me later on, is that you build for the situation you’re in at that moment.

Mark Freeman: 00:57:09

As a startup, technical debt can be very useful. You don’t have to build the perfect thing from the beginning. If you try to do that’s actually a mistake because you don’t know if it’s valuable to the market, you don’t know how long it’ll probably take or what it would take to maintain. There’s too many unknowns. So instead of focusing on the minimal thing that needs to be done to show value and get more information to iterate on, is way more important and so sometimes ‘best practices’ may not align to your situation and it’s actually more advantageous to take a calculated shortcut or calculated risk to move and get more information to iterate on later than to go full on in. Once I realized that I’ve had a lot more empathy for things I may see in the code base at any company.

Jon Krohn: 00:58:02

Nice. Super cool answer. I love that. Prioritization and building for the situation, not forever, are your two big takeaways from working in startups. Love it. All right. And then prior to getting involved in startups at all, you did a Master’s in Community Health and Prevention Research. What drew you into that field? I’m guessing based on the kinds of things you’ve already talked about, it was making an impact. Maybe you have more to tell us about the journey that that background plays in data science. Something that you already talked about earlier in the episode that I imagine plays in here, is that idea of understanding the differences between observational studies and randomized control trials being able to draw causal inferences, but maybe there’s something else as well.

Mark Freeman: 00:58:49

Yeah. I was not planning on being a data scientist. This is a relatively recent thing. I was going to be a doctor. I was very dead set on becoming a physician. Undergrad, I got my degree in Sociology, but took all these pre-med courses because I was like, “I’m going to go to go into medicine.” I volunteered at free clinics, all these different things. My master’s was actually at Stanford Med and I specifically did that program to make me more competitive for when I applied to medical school. And even funnier is that I was in classes, cross listed with medical students. I was taking classes of medical students. I’m like, “I’m at the dream school for medicine and I’m just not enjoying it.” I’m like, “This is not the right fit for me.” I think I knew this a long time ago, but it was just hard because I was just like, sunk cost fallacy, but I really came to terms with the fact that I actually didn’t want to be a doctor and that’s a whole other story [inaudible 00:59:56] another time.

Mark Freeman: 00:59:57

Message me on LinkedIn if you want to know why I don’t want be a doctor. In hindsight, given the pandemic, I think that was a right choice because all my friends are in residency right now, middle of the pandemic and that’s been a very hard time for them. But what I did take was a public health modeling class. I learned stats and I learned R and I just became obsessed. I was like, “This is the coolest thing ever.” On top of my grad school courses, I was spending 20 hours outside of school, just learning the stuff because I couldn’t get enough of it. And more specifically I saw the MarI/O YouTube video where they build a deep learning model to beat a Super Mario level. I saw that video.

Jon Krohn: 01:00:41

Oh, the MarI/O video. That’s so good. I used to show that in the very first deep learning courses that I ever taught, so around 2017, 2018. I loved showing that in my final class as an example of a completely different approach so that they’re using a genetic algorithm to train Mario to excel at… Well, to train a computer agent to learn how to control Mario and excel at the game. It’s such a good video and I’ll be sure to include it in the show notes.

Mark Freeman: 01:01:15

Yeah. I saw that video and I was like, “Wait, you can do this with the computer?” And from there, I was just more so thinking, “Wow, okay. I don’t want to be a doctor, I still want to have social impact, I’m falling in love with coding and statistics, but more importantly, being at Stanford, I’m in Silicon Valley, I’m obsessed with startups.” I got to see what startups are like and kind of peel behind the curtain and see what it takes to do startups, and more importantly, I’m obsessed with the idea of, “How can I scale social change?” As a doctor, I can only impact people one on one at a time, which is very powerful and needed, but it just wasn’t the right fit for me because I’m a systems thinker and data science allowed me to itch that scratch, scratch that itch, but essentially is where I got to do statistics, I got to code, I got to have social impact and I got to be at scale and work in tech. It just seems so perfect and from there I, not necessarily drop all mentors, but at the time were all doctors. They didn’t had no idea what data science was and I started to shift. It wasn’t immediate. It took me a couple years including grad school and then an additional year picking up a second job in analytics and coding on the train to work, to really pick up the skills where I got my first data science job and I’m obsessed. I love it so much.

Jon Krohn: 01:02:46

It shows and yeah, I love the impact that you’re making with it. I already said this in episode number 573 with Doris Xin and she very politely just said, “Doctors are really valuable and I really appreciate their work,” but I made the point then, the same kind of point that you’re making now that there are organizations like 80,000 hours that are dedicated to doing research to try to help you have the most impactful career. One of the founders of 80,000 hours was on this show last year in episode 497, Ben Todd, so if you want to have an episode about how to know the latest opinions on how you can make an impact in general, as well as specifically as a data scientist and how to think about how you might want to optimally make the most impact in your career, then episode 497 is a great one to check out, but back to Doris, in episode 573, in that episode, I brought up how 80,000 hours their research hypothesis is that a lot of people think, “If I want to make a big impact in the world, I want to be a doctor.” You’re like, “Doctors are helping people.”

Jon Krohn: 01:04:06

But because there is, in any given year, there’s a set number of places in medical school, if you make the decision to become a doctor or not, somebody else who is also super qualified will take your place. And so on average, the average doctor is making the same amount of impact as other doctors and you can’t really move the needle and make a big impact. So by choosing to become a doctor, on average, your net impact is the same as if you hadn’t chosen to become a doctor. Whereas in other kinds of things like coming up with the idea for Humu as a startup to be like, “Let’s take these kinds of nudges, these behavioral changes that can make a big difference to the way that people work and the way that they lead and the way that they are as teammates. Let’s take that idea, digitize it, use data science to have machine learning models, that maybe can even do things, once we collect enough data, can do things that wouldn’t have even been possible before we started collecting these data and building these machine learning models.” And then all of a sudden you can scale up these nudges and be making impact on millions of people all over the world, the way they feel in their jobs every week and so as a data scientist there, or as the founder of that company, you’re making a massive impact on the world. So data science-

Mark Freeman: 01:05:36

I think something that really highlights… I was talking to one of my colleagues who worked at FAANG, doing the IO psychology stuff, and they were doing people analytics and they were like, “I’m making work better for literally the top 1% of workers.” They’re at FAANG company in tech as a software engineer, right? Their lives at work is pretty nice compared to the rest of the world, right? And so thinking about how you could take that product and take that same impact you have for a FAANG company, but say for instance, I’m just making names up right now, but a very large retailer where you’re making minimum wage, right? I’ve worked jobs like that. They’re pretty hard, especially if you have pretty bad managers in that. You may not have the choice to leave that job, but what if you had a way to improve that work experience for everyone and take off that cognitive burden, that cognitive load, where people are able to at least not have their job be hard for them at a personal level? So the ability to take a product and scale it for the masses, that’s why I’m obsessed with startups because they figure out a systematic way to take a novel idea and apply it to as many people as possible.

Jon Krohn: 01:06:54

Yep. Love it. Yeah. We’re touching on a lot of the most impactful career choices you can make. Data scientist, software engineer, startup founder, and AI safety researcher is 80,000 hours’ number one pick and you can hear a lot about that, not only in the Ben Todd episode, which was, as I said 497, but also in a more recent episode with Jeremy Harris, 565, we talk a lot about artificial general intelligence in there and how AI safety research could be critical to preventing humans from being wiped out by machines. Super cool. So glad that you brought that up, Mark. In addition to this topic that we’ve just been talking about, making an impact, this question probably isn’t going to be a big surprise for you or the audience, but you’ve volunteered in civil rights organizations, health advisory committees, you’ve interned in a pediatrics advisory program and more recently you’ve been mentoring entry level data scientists. It’s clear that you have a mission to help the broader community through data, especially the most marginalized people. What motivates you to be doing all this?

Mark Freeman: 01:08:17

There’s a lot of factors in this. I think keeping my… I come from a community health background, but I think the most core component that really just resonates with me is, my dad was on a board of a non-profit and so growing up, I would always go to those non-profit meetings because he had to watch me, so I just basically grew up in a non-profit boardroom, watching and more importantly, as a person of color, I saw people of color in power, making pivotal changes in my community. And for context, you can’t see me on the podcast, I’m black and Mexican. The reason being that’s important is that growing up, such in high school, I got bullied a lot and I received a lot of racism, which is very unfortunate. So having that juxtaposition of outside of this receiving racism, saying I couldn’t achieve what I want to achieve because of the color of my skin, to see the complete opposite and people make change at a systematic level…

Mark Freeman: 01:09:24

Specifically for the non-profit he was helping advise was they did down payment assistance for first time home buyers, for families to improve property ownership which was really cool. So I grew up seeing them at a leadership level and so one, that made it so that I knew it was possible for me to have impact and just be aware that there are ways to impact at a systematic level. The second thing is, both my parents didn’t graduate from college and especially my mom, she had a very hard upbringing and so she made sure I was making sure I was very grounded and being aware of the challenges people face and exposing me to various people. One of the key things I noticed between how did I end up graduating from college, being able to transfer from community college where my same friends in community college didn’t, they didn’t have a support system.

Mark Freeman: 01:10:16

My parents made sure they encouraged me to make it through and make it to the next level and achieve my dreams. My mom was very adamant about that because she didn’t have the opportunity. My friends who didn’t, who kind of got caught up in the systems, they didn’t have that support. And so combining that, knowing it’s possible, knowing that a blocker to that is the lack of support, it just made me obsessed with how can I build systems to reduce the barriers for people to pursue a well-meaning life. And healthcare seemed like the obvious thing at first for a long time, especially community health, but as I’ve grown in my career, I see data as that thing to scale up that impact to make sure my same friends had that support as well. My mom, [inaudible 01:11:02] to my mom had that support and this just really stuck with me.

Jon Krohn: 01:11:08

That was a beautiful answer. I had no idea that was coming and yeah, really wonderful answer, and I can see why you are so hell bent on understanding data engineering, hierarchical models, everything that you can because you’re mission oriented. I love that, Mark. So cool. It’s been great.

Mark Freeman: 01:11:32

I’m literally crazy enough to think that I’m going to change the world. And even if I don’t, the fact that I keep on trying, I think that’s something worth living for and change the world for the better. That’s the key thing.

Jon Krohn: 01:11:46

I absolutely love it. And then another way beyond data and healthcare, another way that you are making an impact is with a Decentralized Autonomous Organization, also known as a DAO. So you’re a data science advisor to CharlieDAO, which is ‘building Web3.0′ things.

Mark Freeman: 01:12:12

Yep. Very descriptive.

Jon Krohn: 01:12:14

So what is Web 3.0, what interests you about it, and what do you do for CharlieDAO?

Mark Freeman: 01:12:20

Definitely. First and foremost, shout out to Carlos Mercardo. He’s on LinkedIn. He used to be a data scientist. He basically was like… He got obsessed with Web3. He has an economics background. He quit his data science consulting job, went full time into Web3. He wrote a book that’s free to download that introduced me to blockchain and decentralization. There’s a whole bunch of buzzwords. Don’t worry about it for right now. I can explain later, just ride along with me here. I knew blockchain was going to be something important because I went to a conference in healthcare, precision medicine conference where they talked about how blockchain can be used for EHRs and they had a bunch of professors-

Jon Krohn: 01:13:06

[inaudible 01:13:06] EHR?

Mark Freeman: 01:13:07

Electronic Health Record.

Jon Krohn: 01:13:08

Got you.

Mark Freeman: 01:13:10

When you go to the doctor’s office, the notes the doctors take. So Electronic Health Records are really important for data science in healthcare. Very important. And also for billing and whatnot, and managing those systems are very hard. It’s a fractured data system. And so the argument they’re making the blockchain will provide a system to bring these silos together, to help improve patient care. So they have a bunch of cryptography and medical experts explaining. That was 2018, so I was like, “Okay, blockchain’s this thing. It’s really cool. I don’t know when it’s going to happen, but I’m paying attention.” Carlos came out with that book, I read it. He explained how it can be helpful, mainly not for the US, because the US, we, for the most part, have a trustworthy financial system regardless of your political leanings, compared to other countries who may not be able to even trust to put money into their bank.

Mark Freeman: 01:14:09

So the argument that he was making in his book was that Web3 decentralization allows countries and members of society who do not have a trustworthy government to put and maintain their money, to have a separate source, and that’s a way for wellbeing and social impact in a way through that. So when I read that, my mission-driven mind, I was like, “I need to go to this now,” and I just started learning about it. I joined DAO and the best way to describe it, DAO is the Discord with a bunch of Web3 folks who are tied to a mission and sometimes they have a bank account. Our current one doesn’t, but some do. Some of them have millions of dollars as the Discord with a millions of dollars and they do things with it, which is completely wild.

Jon Krohn: 01:14:54

So, Discord’s basically a bank account, right?

Mark Freeman: 01:15:00

And this sounds all wishy washy it’s because it is. It 100% is. This is a relatively new space. People are really trying to figure it out with crypto of how can we use it to build tools for that, and so the reason why I was like, “You know what? I want to spend time with CharlieDAO a lot of my free time,” dealing with this is one, I just found it really interesting, the same way I felt about data science and learning statistics. I had the same curious obsession with Web3 and crypto and how they work. How does it work technologically? How can you build tools on that to help people? And more specifically, I got really into NFTs. NFTs may be a word that people are like, “Ugh, who is this guy? Why is he talking about NFTs?” There’s lot of hate for it. There’s also a lot of love. It’s very polarizing. Specifically, I got really into analyzing NFTs. I think Web3 and specifically blockchains, if you’re a data professional, I think you absolutely need to get into blockchain. The reason being is that even if you don’t care about crypto, it is one of the largest real world data sets that anyone can access and do things with.

Mark Freeman: 01:16:13

As a data professional, you should be dreaming of this opportunity and it’s growing every single second. So with that, you can do a lot of really cool analysis with behavior economics. You can do a lot of cool analyses. I’m currently doing a network analysis on NFT transactions, specifically to track fraud. So you could build these really cool things. That’s why I became obsessed with it, is like you can build tools in a new space that’s emerging. If it becomes the next big thing, like the internet, great, you’re there early, you know how to build for it. If not, you still have a lot of fun working with data, at least for me personally. So that’s the reason why I got obsessed with it. It’s for fun and it’s how I like to spend a lot of my free time. And so for CharlieDAO I focus on doing analytics, helping build the community any small way I can. And the things about DAO is that, it’s one of those things where you can hop in and hop out. I may have a few months where I’m like super into it. I’m on it, talking to people all the time. I may have some months where I post maybe once the entire month, but you maintain a community, you keep this momentum going to really just work on problems, and specifically for CharlieDAO, his goal was to get a bunch of crypto people, data scientists, and software engineers together and just have them talk about what if?

Mark Freeman: 01:17:39

“What if we can build this? What if we could do that?” and then provide a whole bunch of sports credit MVP, and then from there potentially take things to the market. So we’ve had one product, Deep Freeze, which is essentially like CD accounts. It goes into fancy economics that I don’t understand, but they created a crypto protocol to essentially create CD accounts for your crypto and that’s one of the big projects they worked on. They’re specifically targeting large institutions who want to reduce their risk. So even though they may get a fraction of a penny, a fraction of a penny of a billion dollars is a lot. So that’s the things they’re going after. Other things is like NFT analytics dashboards. There’s a lot of cool stuff happening there and yeah, that’s… It’s so new and it’s just a really fun space to be in if you can get past all the scams and pass all the hype. If you look at the technology from a data professional lens, you’ll be amazed at the opportunities there for you to grow your data skills.

Jon Krohn: 01:18:43

Cool. I can see why you’re so excited about CharlieDAO. Sounds like they’re doing fascinating things. And for those of you wondering what a CD account is, it’s a common financial instrument in the US. It’s called a Certificate of Deposit and so it’s like putting money into a savings account, but you put it in for a fixed term. So you put money, it has to be in the account for a year or two years, and then because you are locking it in, you’ll typically get a better interest rate than if you just put it into a savings account where you can take the money out anytime. Awesome. All right. Mark, what a cool episode. I love your energy and I’ve learned a ton from you.

Mark Freeman: 01:19:21

I’ve got a lot of fun being here. I really appreciate you reaching out to join.

Jon Krohn: 01:19:26

Yeah. It’s a blast in your closet, Mark. It’s been great for me. I’m on your screen in your closet. Look at me.

Mark Freeman: 01:19:32

The podcasters are going to listen like, “What? What is he talking about?”

Jon Krohn: 01:19:40

I just am. I’m right there. My virtual presence is in the closet pressed up against the wall with just Mark there. We’re getting near the end of the episode. Regular listeners will know that now is the time that I ask for a book recommendation. What have you got for us Mark?

Mark Freeman: 01:19:58

I recently got to listen to Katy Milkman, who is, I believe a researcher out of Warden on behavior change. She recently had a book come out called How to Change. How to Change: The Science of Getting from Where You Are to Where You Want To Be. I just ordered this book, so I’m still on the first chapter, but the reason why I’m okay recommending it is that I listened to her talk about the book and she’d go into all her research and talking about it and it was phenomenal. It was exactly the challenges I was facing and one of the key things we’re trying to get at is that behavior change is hard, but if you employ tactics to make the hard things fun, or group it up with certain skills or certain behaviors that you already do well, you can really flip the cycle and really build habits to improve your life.

Mark Freeman: 01:20:53

As a community health person, I’m really big into behavior change because that’s core to health and wellbeing. That’s why I learn from community health and being at Humu, I get to learn about behavior change from these experts and learn how to put that into a product. So this book kind of ties that all together for both my personal life, my academic interests, and then also just my job as well.

Jon Krohn: 01:21:18

Love it. All right. We’ve already mentioned that you have 20,000 followers almost, at the time of recording. Probably definitely by time of publishing this episode, and so lots of people like to get your daily tips on LinkedIn. That’s obviously, I assume the main place where people should be following you. Where else can people follow you?

Mark Freeman: 01:21:43

I think LinkedIn’s the best place to reach me. Like I said, I try to post Monday through Friday and more importantly, the way I try to create my posts is a meeting ground for people who are learning just like me, or maybe on the beginning of the journey. I’m much further longer in my journey now, and then people who are way more experienced than me. Create content that peaks interest of both sides to have conversations so we can all learn from each other. So I highly encourage either just check out my content and just reading the responses people are saying, because they have some really talented people giving me great advice. Again, 20,000 mentors, or you can ask your own questions. I’m happy to respond on those posts. And typically, I talk about data science, what I’m currently learning at my job right now, maybe the challenges I face and mistakes I made and they’re all there just all for us learn together.

Jon Krohn: 01:22:34

Sweet. All right. Thanks a lot, Mark. Definitely follow Mark to get the latest that’s going on in his career. All the mistakes he’s making. That’s what I want you to read about.

Mark Freeman: 01:22:46

They’re great mistakes. Only the best of mistakes.

Jon Krohn: 01:22:53

Sweet. All right, Mark. Thank you so much for being on the show and we’ll have to have you on again sometime check in, see how you’re doing.

Mark Freeman: 01:22:59

That’ll be great. Thanks for having me.

Jon Krohn: 01:23:06

What a fun episode today. I pretty much always have a really good time with guests, but Mark in particular was an absolute hoot and lots of meaty data science and software engineering knowledge to share with us. In today’s episode, Mark filled us in on the AIDA, attention, interest, desire, action model, he uses to guide his catchy content creation. He talked about how Humu leverages data and machine learning to nudge people into the direction of more effective workplace behaviors, how junior data scientists are task-based while senior data scientists come up with their own commercially impactful work, how he loves SQL, particularly Google BigQuery for efficiently extracting the data he needs from a large database, how he loves hierarchical models in R for handling the nuance of subgroup data and avoiding Simpson’s paradox and how all data scientists should perhaps be interested in Web3 because of the massive amount of publicly available data stored on the blockchain.

Jon Krohn: 01:24:03

As always, you can get all the show notes, including the transcript for this episode, the video recording, any materials mentioned on the show, the URLs for Mark’s LinkedIn profile, as well as my own social media profiles at www.superdatascience.com/587. That’s www.superdatascience.com/587. If you enjoyed this episode, I’d greatly appreciate it if you left a review on your favorite podcasting app or on the Super Data Science YouTube channel. I also encourage you to let me know your thoughts on this episode directly by adding me on LinkedIn or Twitter, and then tagging me in a post about it. Your feedback is invaluable for helping us shape future episodes of the show. Thanks to my colleagues at Nebula for supporting me while I create content like this Super Data Science episode for you. And thanks of course, to Ivana Zibert, Mario Pombo, Serg Masis, Sylvia Ogweng and Kirill Eremenko on the Super Data Science team for managing, editing, researching, summarizing, and producing another sensational episode for us today. Keep on rocking it up there folks, and I’m looking forward to enjoying another round of the Super Data Science Podcast with you very soon.

Jon Krohn: 01:25:01

And thanks of course to Ivana Zibert, Mario Pombo, Serg Masis, Sylvia Ogweng and Kirill Eremenko on the Super Data Science team for managing, editing, researching, summarizing, and producing another deep and stimulating episode for us today. Keep on rocking it out there folks, and I’m looking forward to enjoying another round of the Super Data Science podcast with you very soon.

Podcasts SDS 587: Data Engineering for Data Scientists

SDS 587: Data Engineering for Data Scientists

Podcast Transcript

Share on

Related Podcasts

July 10, 2026

July 7, 2026

July 3, 2026

Podcasts SDS 587: Data Engineering for Data Scientists

Share

SDS 587: Data Engineering for Data Scientists

Podcast Transcript

Share on

Related Podcasts

July 10, 2026

SDS 1008: The AI-Native Startup Playbook

July 7, 2026

SDS 1007: How to Find Solid Career Ground in the AI Era, with 80,000 Hours Founder Ben Todd

July 3, 2026

SDS 1006: In Case You Missed It in June 2026