Podcasts SDS 561: Engineering Data APIs

54 minutes
Business, Data Science

SDS 561: Engineering Data APIs

Subscribe on Apple Podcasts, Spotify, Stitcher Radio or TuneIn

Ribbon Health CTO Nate Fox sits down with Jon Krohn to discuss APIs– how to build one from scratch, ensure its uptime, and leverage ML models to improve the quality of healthcare delivery.

Thanks to our Sponsors

About Nate Fox

Nate “Fox” is Co-Founder and CTO of Ribbon Health, an API data platform that powers high-quality, cost-effective, and convenient care decisions. Nate got his start in engineering at MIT before going to Microsoft. He couldn’t resist the “startup bug” and joined Unified where he worked closely with leadership as an engineer to launch new data products that helped drive new lines of business. Fox attended Harvard Business School with his now Co-Founder, Nate Maslak, who shared his vision of powering smarter healthcare decisions with data. When he’s not Ribboning, Nate loves spending time with his wife and two kids, Ary and Liana, who always remind him of the importance of closing the terminal for playtime!

Overview

It’s all about application programming interfaces (APIs) and machine learning in this week’s episode, as Jon Krohn sits down with Ribbon Health co-founder and CTO Nate Fox to discuss how data science can significantly improve healthcare data.

Ribbon Health provides healthcare enterprises with an API layer for accurate data on doctors, insurance plans, and costs & quality of care. But healthcare data is highly fragmented and scattered across thousands of sources. And the team at Ribbon Health aims to correct this to help others make timely, high-quality, and cost-effective decisions.

Jon and Nate kicked off the conversation by defining APIs from the get-go, explaining to listeners that APIs allow information to go back and forth and abstract away the complexities of programming for programmers.

It’s clear that Ribbon Health provides critical solutions, but how did they go about designing their API, Jon asks? First and foremost, Nate emphasizes that empathy is what drives Ribbon’s API. Then, using their experience from previous roles and taking on feedback from current users, Nate also revealed that they refined their API through market experimentation, which returned strong signals about the endpoints that resonated with customers most.

Next, Jon wondered how Ribbon ensured the uptime and reliability of their API. “It’s so critical for us to be reliable because people are building their applications and their healthcare experiences on top of our API,” Nate notes. “And the stakes are really high because if we go down, people can’t access care at that moment. So we take it very seriously.”

As a starting point, Ribbon ensures reliable uptime with a fantastic API team. Nate also explains that they leverage the load-balancing infrastructure that AWS offers to ensure that they can handle spikes in demand. This is in addition to a variety of testing and monitoring. Datadog, in particular, has been a helpful tool for Ribbon Health’s API reliability.

Next, it’s time to step into Nate’s daily shoes as the CTO of a fast-growing, API-focused data company. As far as his guiding principles, the co-founder operates according to the following question: ‘what can I do to best scale and grow the organization more broadly?” He approaches this main objective according to three main pillars:

Team building: meeting great talent and helping them understand Ribbon’s mission.
Processes: refining the procedure for building products and thinking about how his teams make decisions.
Technical Architecture: growing the technical architecture of the organization.

Tune in to the full episode to learn more about Ribbon Health’s processes, including how they train a data model to assign confidence, the tools they depend on heavily, and their value-driven culture.

In this episode you will learn:

What are APIs? [13:20]
How Ribbon Health’s data API leverages ML models to improve the quality of healthcare delivery [16:08]
How to design a data API from scratch [20:00]
How to ensure the uptime and reliability of APIs [25:28]
How Ribbon uses knowledge graphs, manually labeled data samples, and an XGBoost model with hundreds of inputs to assign a confidence score [27:14]
Nate’s favorite tool for easily scaling up the impact of data science [37:40]
What is Nate’s day-to-day like? [34:34]
The qualities Nate looks for when hiring data scientists [39:50]
How scientists and engineers can make a big social impact in health technology [42:50]

Items mentioned in this podcast:

Neptune.ai
Einblick
Ribbon Health
AWS Lambda
Datadog
XGBoost
How Will You Measure Your Life? by Clayton M. Christensen, James Allworth, Karen Dillon
ScaleUp:AI
Nebula

Follow Nate:

Follow Jon:

Episode Transcript

Download The Transcript

Podcast Transcript

Jon Krohn: 00:00

This is episode number 561 with Nate Fox, CTO of Ribbon Health. This episode is brought to you by Neptune Labs, the metadata store for MLOps and by Einblick.ai, the collaborative way to explore data.

Jon Krohn: 00:18

Welcome to the SuperDataScience podcast, the most listened to podcast in the data science industry. Each week, we bring you inspiring people and ideas to help you build a successful career in data science. I’m your host, Jon Krohn. Thanks for joining me today. And now let’s make the complex simple.

Jon Krohn: 00:49

Welcome back to the SuperDataScience podcast. We’ve got a real nice one for you today with Nate Fox. Nate is Chief Technology Officer at Ribbon Health, a New York-based API platform for healthcare data that has raised $55 million, including a four to $3 million Series B last year from some of the biggest names in venture capital like Andreessen Horowitz and General Catalyst. He previously worked as an Analytics Engineer at the marketing startup Unified and as a Product Marketing Manager at Microsoft. He holds a bachelor’s in mechanical engineering from the Massachusetts Institute of Technology and an MBA from Harvard Business School.

Jon Krohn: 01:27

Today’s episode has some technical data science and software engineering elements here and there, but much of the conversation should be interesting to anyone who’s keen to understand how data science can play a big part in improving healthcare. In this episode, Nate details what APIs are, how you design a data API from scratch, how Ribbon Health’s data API leverages machine learning models to improve the quality of healthcare delivery, how to ensure the uptime and reliability of APIs, how scientists and engineers can make a big social impact in health technology, his favorite tool for easily scaling up the impact of the data science model to any number of users and what he looks for in the data scientists he hires. All right, ready for a great episode? Let’s go.

Jon Krohn: 02:16

Nate, welcome to the SuperDataScience podcast. It’s awesome to have you here. Where in the world are you calling in from?

Nate Fox: 02:24

Jon, thank you so much for having me here. Super excited to be here today and talk about all things data science and whatnot. So yeah, my name is Nate Fox and I’m currently calling in from Westchester actually where I live with my family, my wife and my two kids.

Jon Krohn: 02:38

Nice. So near New York, you commute by train, I guess and yeah, you work in Manhattan.

Nate Fox: 02:45

Yep. That’s right. So big, big fan of the suburbs. Love the little extra space, especially with the kids and everything. But yeah, I walk to the train from where we live and then take the Metro-North to Grand Central, then I take the 6 down to where I work in Ribbon Health.

Jon Krohn: 03:00

Nice. Yeah, very specific. Now people can follow your route into work.

Nate Fox: 03:03

Yeah, yeah [crosstalk 00:03:05].

Jon Krohn: 03:04

Look out for our listeners waving at you on the 6 train. Brilliant.

Nate Fox: 03:10

Absolutely.

Jon Krohn: 03:11

So we know each other through Austin Ogilvie. He was on episode number 535, and I’ve known Austin for many years. A pillar of the New York City data science startup community, he’s made a number of amazing introductions to me for guests. And now you are another one of them. How do you know Austin?

Nate Fox: 03:31

Oh, Austin’s awesome. So I went to HBS, Harvard Business School with someone named Sam Li from Laika. And so when Sam was working on his new startup idea in the security space and compliance space, he’s working with Austin, one of his co-founders. And so the funny thing actually is that we’re both companies that have hit their Series B recently, but we both kind of started around the same time. And so it was kind of fun to see Austin’s company scale to its current size. And we had been watching each other, both kind of growing each other’s businesses. And actually fun fact is that we actually use Laika at Ribbon Health. So we’re a big fan of what Austin’s working on.

Jon Krohn: 04:08

Nice. Yeah. So it’s compliance-as-a-service is kind of their big selling point and yes, they’ve had some great raises lately, and you have too. So you are the CTO of Ribbon Health since founding it six years ago and you’ve now raised $55 million, including, as you mentioned, kind of going neck and neck with Austin there at Laika. You raised a $43 million Series B in 2021, including from some of the biggest names possible in the venture capital world like Andreessen Horowitz and General Catalyst. So congratulations. That’s amazing. No doubt it’s because of that compliance-as-a-service offering from Laika that you’re using that’s…

Nate Fox: 04:50

Yes, 100. We owe it a lot. Yes. Thank you, Austin.

Jon Krohn: 04:56

And so yeah, so what does Ribbon Health do? What problem does it solve?

Nate Fox: 05:01

Yeah, absolutely. So information on healthcare data is highly fragmented across thousands and thousands of sources. This is information like where are doctors actually practising? What insurance do they accept? What conditions do they treat? How cost effective are they in treating care? And all this data is sitting in thousands and thousands of different sources. For instance, here at Ribbon Health, we see hundreds of locations for a single doctor, and which is just crazy, how much information is out there.

Jon Krohn: 05:30

So-

Nate Fox: 05:30

And not only that-

Jon Krohn: 05:32

You look up the name of the doctor, and that doctor can be associated with a hundred different medical addresses?

Nate Fox: 05:39

That’s correct, across potentially thousands of different sources. It’s really, really fragmented data out there. And what we do is we allocate all of that data to try to sort of make… find the signal in the noise and apply machine learning and data science in a lot of innovative and interesting ways. It’s a really gnarly data science problem that the team here is just really incredible in how much they’ve done to sort of address this problem.

Jon Krohn: 06:06

Nice. And we are going to dig into some of those specific data science problems. So I am fascinated by the story behind the way that you got started because you didn’t start Ribbon Health with exactly this product, right? It was around 2018. You realized that there was this big data quality issue and kind of started focusing on the API that you’re developing now today.

Nate Fox: 06:32

Yep. That’s absolutely right. So we started off. Our mission is to help make every healthcare decision be cost effective, high quality and convenient. That’s never changed, but what has sort of evolved is that how. And so the way we started was that we were a care navigation platform trying to make it easier for patients to find care. And the way that we did this is we actually sold our service or our software to employers as a benefit for their employees. And so my co-founder, Nate and I, we were really helping these patients find and utilize their healthcare, and we were finding them doctors, booking them appointments, et cetera. And-

Jon Krohn: 07:08

I see. So was it like a self-service UI or you would actually have people serving these users?

Nate Fox: 07:16

So it was a self-service UI initially with our backend data powering it, which ended up sort of becoming the infrastructure that for the rest… the company that currently exists today. But getting back to your question, so yeah, it was self-service. A lot of things in healthcare actually that have driven significant high quality outcomes is proactive outreach. And so we would have users of this service, and we’d proactively reach out to them and say, “Hey, did you know that your healthcare plan has a free PCP appointment? We’d love to book it for you. Click this button and we’ll take care of everything for you.” Because people would say, “Yes, find me a therapist, find me a cardiologist, find me PCP.” And-

Jon Krohn: 07:58

Find me some angel dust, find me a primary care physician. Got it, got it, got it. That PCP.

Nate Fox: 08:04

Right. Yes, exactly. I mean, the utilization of his feature was phenomenal. And people were like, “Oh, wow, I’ve been meaning to go to a doctor, but it’s kind of a pain. So like, sure, find me that doctor.” And so Nate and I would then on the backend, once you would get someone clicking this button and saying, “Yes, go find me that doctor,” then it can provide some information on the kind of doctor they’re looking for, et cetera, Nate and I would then use the data that we had in our platform to then book them care.

Jon Krohn: 08:29

Right.

Nate Fox: 08:29

And-

Jon Krohn: 08:30

And Nate is your co-founder and you guys have the same name? I mean not the same-

Nate Fox: 08:32

Yes.

Jon Krohn: 08:33

… whole name, the same first name?

Nate Fox: 08:34

Yes, yes. I guess for the listeners, just to clarify yeah, that’s right. My co-founder’s name is also Nate. I actually go by Fox usually at the company. Just that’s kind of my call sign. Just make things easier for folks. But yeah, that’s right.

Jon Krohn: 08:48

Nice, cool. Well, you’re fortunate to have such a cool last name. It sounds like a nickname.

Nate Fox: 08:53

It is, it is. It’s great. It’s a great nickname. But I guess getting back to your question, and so the way we cast upon this problem is that when Nate and I were trying to find and book this care that we realized how bad the data was. We were aggregating data from a number of different sources and we would call this provider and they would say, “Oh nope, out of state.” “Oh nope, this doctor’s retired.” “Nope, this doctor doesn’t accept that insurance.” “Actually, this doctor passed away.” And it was quite difficult. We would have to call sometimes 10 to 20 different providers to get patient X access to the care that they needed. And so we initially didn’t start off saying, “Oh, we’re going to solve this massive data problem.” We actually said, “Hey, we want to book more appointments faster, more effectively at a high quality for our patients.” And so to do so, we started aggregating collecting data and trying to predict the probability that a data point is more accurate or more reliable to make our actual operations more effective.

Jon Krohn: 09:48

I see. I see we’re gaining here. I’m starting to understand exactly what Ribbon’s doing today. So you were trying to deliver convenient, affordable healthcare to people. People were discovering through your platform that they did have an annual free visit to a primary care physician. And so they’re like, “Yes. Wow. That’s cool. I’d like to have that.” And then in the backend, you guys would set about trying to find them a primary care physician. And you keep discovering that doctors that are in their network aren’t accepting new patients, or the contact details are no longer relevant or something like that. And yeah, so I came across a stat while I was researching for the episode that something like 50% of data points in healthcare in the US are wrong.

Nate Fox: 10:38

Yes. That’s exactly right. And it’s pretty wild actually. And you can read a lot about this from the CMS that have been kind of working hard to hold insurance companies and health plans to say, “Hey, you can’t have directness of this accuracy. There’s a lot of different studies on just how inaccurate this data is, and it’s a really big systemic problem.”

Jon Krohn: 10:58

What’s a CMS?

Nate Fox: 11:00

Oh, the Center of Medical Services in the US.

Jon Krohn: 11:03

So it’s like a federal body that tries to regulate the quality of service that’s provided or something?

Nate Fox: 11:10

Yes, that’s correct. That’s correct.

Jon Krohn: 11:12

Cool. All right. So then I can see the problem. You’ve got this data quality problem. So then you guys think, let’s focus on building a data API for healthcare data. And in addition to providing data, we will use data models, machine learning models to try to identify which data points are more likely to be correct than others.

Nate Fox: 11:39

That’s correct. That’s correct.

Jon Krohn: 11:42

Cool. And then so I guess there’s something, like in some situations, if there’s a high probability that the data are wrong, maybe you don’t surface it at all or maybe you surface it with a warning or something like that.

Nate Fox: 11:56

Yeah. This is a fascinating case of applying machine learning and AI. I think it’s so important to have the sort of cross-functional collaboration between the data science and the product engineering and the API. It’s kind of fascinating, but yeah, so we actually have confidence scores for the data points that we share, ranging between… We wanted to simplify it. So we made a five point system. And so five is verified, validated. It is true. We know it to be true. Four is a super high confidence that our model of 90% plus accuracy that we think is true, all the way down to one actually where it’s the inverse where we’re actually very confident that it’s wrong.

Jon Krohn: 12:34

Right.

Nate Fox: 12:34

And what’s really interesting from a sort of solving data problem standpoint is actually depending on our end user, both data points is really useful. Health plans, for instance, really want to know that data is… at a high confidence is wrong and they should probably not have that in their directory. And then of course, digital health companies are people do care navigation. They want to only use data that’s very high accuracy and filter out all the noise from their system so they can get better hits in terms of helping patients find a doctor that actually is where they are supposed to be.

Jon Krohn: 13:07

Very interesting. Well, my question, I’ve got a first question and then I’m going to get to a second question. So the second question is going to be, you’ve kind of mentioned a couple of different kinds of users of your API there. So I’d love to hear from you what kinds of organizations or people use your API. But before we get to that question, I also just want, in case there are listeners out there, what is an API? What does that mean?

Nate Fox: 13:35

Yes, absolutely. So API stands for application programming interface. And what that means is it’s basically a protocol that allows other developers to utilize a service within their code in a very effective, seamless way. And so typically what that means is it’s kind of like the pipes of which programs can talk to each other. And so I always think of a very powerful example of this API is Stripe, right? You don’t want to have to build your own payments infrastructure and platform, right, if you’re developing an e-commerce website. And so in a few lines of code, with Stripe, boom, you have that ready to go. So it just drags away a lot of complexity and a lot of developers love using all kinds of APIs in terms of the things that they’re building.

Jon Krohn: 14:20

Cool. So the programmer doesn’t have to develop something from scratch. With the Stripe example you’re giving, they don’t have to come up with their own payment verification system. They can use the Stripe API and then they provide information to the API, or their users can indirectly provide information to the API like their credit card details. And then Stripe can handle doing some stuff behind the scenes, making the payment happen, verifying the credit card details. And then it can return some information back to you. So the API takes information like credit card details and then brings back other information like, “Verified, this payment is all good.” And then you can choose what to do with that information. You can present it to the user, you can bring them to another screen that says, “Congratulations, your payment has gone through.” Yeah. So APIs allow information to go back and forth and to abstract away the complexity of programming for programmers.

Nate Fox: 15:24

Yes, yes, absolutely.

Jon Krohn: 15:27

99% of machine learning teams are doing awesome things at a reasonable scale with, say, about four people and two production machine learning models. But most of the industry best practices that we hear about are from a small handful of companies’ operating models at hyperscale. The folks over at Neptune.ai care about the 99%. And so they are changing the status quo by sharing insights, tool stacks and real life stories from practitioners doing ML and MLOps at a reasonable scale. Neptune have even built a flexible tool for experiment tracking and model registry that will fit your workflow at no scale, reasonable scale and beyond. To learn more, check them out at Neptune.ai. That’s Neptune.ai.

Nate Fox: 16:15

And just the way you kind of described the Stripe example really well, the API that we have does the same sort of complexity abstraction for provider data. All those messy thousands of sources you have to deal with, manage and reconciling that data, editing and changing the information if you realize that it’s incorrect and you need to correct it. It’s a giant data management nightmare, and our API aims to make that a lot more seamless.

Jon Krohn: 16:40

Cool. All right. So let’s dig into that a bit more. So what kind of information do you provide to the API? What kind of information comes back? What are some examples of users of this information?

Nate Fox: 16:54

Yep, absolutely. So one of our primary use cases is what we call the find care use case. So a lot of people are using our technology to help patients find care in a variety of different contexts. So one example of a endpoint is our providers’ endpoints. So you can basically tell this endpoint and say, “My address is 100 Main Street in New York City. I have Humana as my insurance, and I’m looking for a doctor that treats patients like me that are male of age 30 to 39.” And the API empties that information and then uses that to sort of filter and do a geo search of the providers that actually match those parameters. And the response says, “Hey, here are 10 providers and all of their data, what conditions they treat, their average rating, their score, their distance to the address that you put in.”

Jon Krohn: 17:47

Nice.

Nate Fox: 17:47

And then… Yeah.

Jon Krohn: 17:49

And then confidence scores on top of some of the information as well.

Nate Fox: 17:52

Exactly, right. And I actually forgot to mention, a lot of times people are using what we call our main location confidence parameter. So they actually are saying, “Hey, Ribbon, I only want data that is highly accurate.” And so oftentimes that’s an input. And so providers can sort of… the end users can then see, “Oh, this data, I can trust this data.” And they can decide how to use that confidence score in the front end for the end user.

Jon Krohn: 18:15

Nice. So that makes sense. So when you describe a situation like that, somebody looking for a doctor, most people don’t know how to write programs themselves. Most people wouldn’t be able to call your API to let them know that they live at 100 Main Street and they’re between 30 and 39 and that they want high quality information. So then what kind of person, what kind of organization uses your API to surface this information in a user interface for a user to work with?

Nate Fox: 18:43

Yep. Absolutely. So it’s really interesting actually. There’s 10 billion plus healthcare decisions being made every year in the US, and that’s happening in so many different kinds of contexts. I think there’s over 100,000 healthcare applications that are doing some sort of mode of decision making in healthcare. And this just means that our client base actually separates across many different segments. And so I’ll actually use three broad categories for the listeners and the kinds of people that are using our technology.

Jon Krohn: 19:12

Mm-hmm (affirmative).

Nate Fox: 19:13

The first is health plans. So these are insurance companies that are very deep in the provider data problem. The second is providers, which are companies that are providing care to patients. So you can think of a company like One Medical or a telemedicine company like Ro where they’re actually providing patients with care. And then finally a category we call digital health. So there’s been a massive explosion and investment in this space. And there’s many kinds of companies offering either direct to consumer or care navigation for employers, all kinds of solutions in this space. And they kind of stumble upon this provider data problem and they say, “Wow, this is really hard and challenging. I wonder if there’s technology that helps me solve it so I can focus on my other core competencies.” And so that’s also another segment.

Jon Krohn: 20:00

Super cool. All right. So when you set out to design this API from scratch, how do you do that? How is the CTO of a company that you’re like, “Okay, clearly there’s this big data problem. We need to fix this. We need to provide plumbing to all these different kinds of users, health plans, healthcare providers, digital health companies.” How do you then set out to design this API?

Nate Fox: 20:26

It’s a great question. I think one thing that comes to mind is I think we actually had a lot of empathy as a company because we were utilizing other data services and provider data solutions before we built what is now Ribbon Health. And so I think in having built healthcare applications ourselves, a lot of things we wish we wanted in an API that would help us build the [inaudible 00:20:52] we were building, kind of guided some of our initial product development. And so that was kind of where we started, right?

Jon Krohn: 21:00

Right. So I imagine a lot of people, when they’re designing an API, they would need to talk to potential customers because they might not actually have firsthand experience of what the API they would need. So you have this idea. You’re like, “Okay, I have this idea for an API.” But without having actually needed that API yourself, you might need to talk to different prospective users. But in your case you had already built a user interface that would love to have had the API that you later built. You could just imagine, you’re like, “Well, what would I love to have? Okay. I’d love to have X and Y and Z.” And then you kind of prioritize those features and you’re like, “Okay, what can I get to market relatively quickly that I can start selling that’s not going to be the biggest possible lift?” So you kind of prioritize features in that way. And then you just start building them and releasing them.

Nate Fox: 21:55

Yep, exactly. And we certainly had a lot to learn from what our customers needed besides just our own experience, and actually-

Jon Krohn: 22:01

Sure, sure, sure.

Nate Fox: 22:02

… there was a really interesting thing we did kind of by organic happenstance. So as we talked about earlier in this podcast, we were this care navigation service before. Both my cofounder and I actually had backgrounds in data startups, and so we were very naturally building this backend infrastructure. And then other healthcare companies came to us and said, “Hey, where do you guys get your doctor data? It looks pretty good. Can we access it? Do you guys have an API? Can we tap into this data?” And so we started kind of just fueling this very strong pull. And so we had all of this backend internal endpoints that we were using for ourselves, and not all of them were ready to be exposed necessarily with the full investment infrastructure ready and available to be fully prioritized.

Nate Fox: 22:46

And so actually what we did is in our documentation, we sort of wrote down all the theoretical endpoints, and we could… Either we’re already live or could be built within 24 to 48 hours. And we had a status on the documentation page that said like, “Live,” for the endpoints that were actually truly live in the API and then another status that was like, “Contact us to learn more.” Right. And through that experimentation, we got a lot of strong market signal of which endpoints were resonating with customers and where we should invest in developing. And so we could really kind of align our development efforts with what customers were explicitly asking for. Once they saw, they were like, “Oh, I want that. Yes. Where is it? Why can’t I access it? I want it now.” And that would really help us… have empathy for the end user.

Jon Krohn: 23:32

Nice. Very cool. All right. So in your answer to my last question, Nate, you alluded to your background, and so I’ve got a question for you related to that. So you’ve worked extensively in business analytics at companies like McKinsey, Microsoft and Unified. Did you encounter a lot of incorrect and missing data in those experiences as well? And so how, if so, has your experience with those kinds of business analytics situations helped you on your journey to co-founding Ribbon?

Nate Fox: 24:05

It’s a very interesting question. So at Microsoft I was working on what was called the Windows scorecard, which is a really interesting KPI dashboard the executives there used to understand the state of their business. So what’s our market share in Internet Explorer or for Surface tablets, what have you? And you can imagine for a scorecard like that, you have tons and tons of different sources, right, that you have to aggregate and centralize in one place. That’s one thing. And then also Unified was this amazing company that works on sort of the marketing supply chain of ad tech. So how does the CMO understand their hundreds of millions of dollars of ad spend across Snapchat and Instagram and Facebook and Google, et cetera? It’s actually really hard to answer that question because the data is very fragmented across many different silos. Right.

Jon Krohn: 24:56

I can see where you’re going.

Nate Fox: 24:58

So you can kind of see it’s easier parallels. So I think the things I had learned there was actually sort of the importance and the power and the value of bringing data together in one centralized place because it’s really, really, really challenging to do so. And then also how do you make sense of the signal from the noise, right? What are the principles of which the data is accurate and actually matters and how do you design the systems to employ strong analytics and et cetera to take action with that data and have it actually be useful? So I think those are probably the things I picked up and applied that are relevant to Ribbon Health.

Jon Krohn: 25:34

Nice, great answers. And then, so I imagine that now that you’re at Ribbon and you are worried about data quality, there’s also probably a lot of other issues with data. So for example, how do you address issues around uptime and reliability of your API? So data quality is one thing, but then how do you also on top of that ensure yeah, the reliability of your API?

Nate Fox: 26:05

Oh yeah, no. It’s a great question. So I guess the first, it starts with an amazing API team at Ribbon Health, which really ensures high uptime and performance. It’s so critical for us to be really reliable because people are building their applications and their healthcare experience on top of our API. And the stakes are pretty high because if we go down, then people who are trying to access care can’t access care in that moment. So we take it very, very, very seriously.

Jon Krohn: 26:33

Right.

Nate Fox: 26:34

And so the way we do this is that we leverage a lot of the load balancing and sort of infrastructure that AWS offers to ensure that as… if there are spikes in demand, we can handle them in addition to having a variety of testing and also monitoring, right? We’re big fans of Datadog that allows us to understand immediately if there’s something happening with the calls that we’re getting and then allow us to quickly either revert or make a change so that our API doesn’t go down.

Jon Krohn: 26:59

Nice. That’s a really good tip there. Yeah. Datadog, I could imagine, would definitely be useful for monitoring reliability. It’s what it’s designed for. So-

Nate Fox: 27:07

Yes, and stack tracing. Stack tracing, incredible. Because a moment you hear something’s going… what’s going wrong in here? You can kind of see in the code. And one of our engineers set that all up and it was awesome.

Jon Krohn: 27:18

Brilliant. All right. So let’s talk about a specific example where you could use data science to solve an issue. So we talked about the confidence scores, which actually that would be something maybe even of itself that I didn’t ask about. But how do you train a data science model to assign confidence scores, if you’re allowed to share that, if you can share that kind of publicly.

Nate Fox: 27:41

Oh, absolutely. So I think this is a fascinating example of how prioritizing machine learning and AI is a really cross-functional thing that requires collaboration of data engineering, data science, product and marketing to actually position and make a complicated thing, right, a machine learning model in a way that’s addressable to the end user. Right. And so step one is, in the data engineering sense, we were talking about this data point that has to happen. We’re collecting data from thousands and thousands of different data schemas and transforming that data and normalizing it at scale to create what we call this massive, massive knowledge graph. And in this knowledge graph, we have an immense amount of data. A lot of it is not right and incorrect because it’s impossible for a doctor to be practicing at 50 to 100 locations at once.

Jon Krohn: 28:32

Right.

Nate Fox: 28:32

But more importantly, normalize that data and get the unique set of information, all the unique sets of addresses and phone numbers and what’s… an NPI… That’s a unique identifier for a doctor. Really important by the way, and I could go on and on about joint keys and the points of unique identifier, but I’ll hold off on that. And then you have this knowledge graph. And what we do is most of that data is incorrect, right. But what’s really important is we need the right truth set. And this is where the collaboration with product is really important. So we actually have a call center where on a monthly basis we are sampling this knowledge graph and calling tens and tens of thousands of providers. And we have a very, very carefully worded script that product is designed with the end user in mind of like, what is truth? How do you define truth, right? Because everyone has different definitions of truth, but we have our own perspective on that.

Nate Fox: 29:23

And the product managers actually kind of decided based on what they… we want the end user to see, which is high quality data that you can trust that if you call a doctor and you say, “I want to book an appointment,” that you can book an appointment and you can go to that address, and that’s correct. Right. And so a script is worded in a way where we call that provider and we say, “Hello. Is this the office of so and so at 100 Main Street? Can I book an appointment?”

Jon Krohn: 29:52

Yeah. “Is this doctor so and so?”

Nate Fox: 29:52

Yeah. This is actually very normal in healthcare. Because the data is so bad, people are used to these validation calls.

Jon Krohn: 29:58

Right.

Nate Fox: 29:58

And so the questions that we ask give us discrete true/false signals that we can then train a classifier on predicting, is it true of false, right? We know we experimented with Random Forest and other different kind of models to do this. And we use a very high number of errors XGBoost model that has hundreds of variables to predict the probability of true versus false.

Jon Krohn: 30:24

Cool.

Nate Fox: 30:25

How you’re supposed to define agree and disagree? And we then get a predictive… We get an output between zero and one, right. Now saying 0.723 is not particularly useful to the end user. And this is where product and marketing also come in. How do we position these things in terms of confidence scores that matter, right? And so then there’s a collaboration between data science and sort of product marketing to figure out, “Okay, what does it mean for somebody to be a four out of five and how do you position that?” It’s a highly accurate data point, 90% plus accurate. So then that’s the full sort of supply chain from data engineering, all the way to product marketing that makes this a reality.

Jon Krohn: 31:00

Cool. I’m so glad that I asked and that you could share that is fascinating. So you start with a knowledge graph and then you sample from that knowledge graph to manually check the quality of the data. And then you can use those labels as inputs into a machine learning model. And right now you guys are using an XGBoost model with hundreds of inputs to predict those labels and so you get some float value confidence score.

Nate Fox: 31:28

Yep.

Jon Krohn: 31:29

And then you convert that into something that’s easy to understand. There’s a five point scale from one to five for users to be able to quickly digest, “Okay, it’s a four, it’s pretty reliable, or it’s a five. Great, I can definitely trust this.”

Jon Krohn: 31:42

Einblick is a faster and more collaborative way to explore your data and build models. It was developed at MIT and showed to reduce time to insight by 30 to 50%. Einblick is based on a novel progressive engine. So no matter the data size, your analysis won’t slow down, and Einblick’s novel interface supports the seamless combination of no-code operations with Python code. This makes Einblick the go-to data science platform for the entire organization. Sign up for free today at Einblick.ai. That’s E-I-N-B-L-I-C-K.ai, E-I-N-B-L-I-C-K.ai.

Jon Krohn: 32:24

Nice. Another cool example of machine learning that you guys do at Ribbon Health that I came across while I was researching for this episode, or that you might have even mentioned to me on a previous call was that you run into this problem of entities having multiple names. So you have the same hospital, say, that goes by tons of different names. So you could call the Stanford children’s hospital is also known as Lucile Packard. It’s also known as LPCH. There’s dozens potentially of different ways of referring to the same hospital. And if you’re a person, particularly if you’re already familiar with the institution, it’s no problem for you to resolve that ambiguity. But to a machine, it’s just a string of characters. And so LPCH sounds a very different thing from Lucile Packard. So you guys have come up with a pretty cool technique for resolving this issue, yeah?

Nate Fox: 33:20

Yep, absolutely. Yeah. So this is like a big cluster problem for us to do with any information. And we use dimensionality reduction using principal component analysis, or known as PCA as well as clustering via a Gaussian mixture modeling. And I cannot take credit for this. It was designed by our amazing data science team, which is… You can actually read about Sameer in our technical innovators award who was kind of the mastermind behind this model as well as one of his peers, Aaron, who worked together closely on this.

Jon Krohn: 33:56

Cool. Nice. So do you have a bit more detail for us on how that works? So I guess you have a bunch of different information. So for every hospital, for every entity, you have size information, location information. And so you’re able to provide all of that as inputs into a clustering model or a Gaussian mixture model so that it can yeah, start to cluster together, okay, based on the size and the location, it seems pretty likely that Stanford children’s hospital, Lucile Packard and LPCH are actually all the same thing?

Nate Fox: 34:36

Yes. That’s exactly right. And again, you’ll see, there’s a formula here, right. We actually built another truth set where we sort of had humans manually cluster this data together so we could get train on something that we know we trust. And that’s exactly right. We geocode the address into a lat long coordinate so we can cluster around an address. But even within an address, you’ll see a wide variety of names. And a lot of times actually addresses have multiple entities at them, right. Imagine an address that has different departments, right, department of gynecology, and what have you. And so we actually have to sort of cluster within an address node. And there are other features like a phone number or a location type or other metadata allowed us to then do that clustering more effectively.

Jon Krohn: 35:22

Nice. Really cool. Okay. So cool. So I’m starting to understand all of the different facets that you understand as a CTO, Nate. So you’ve got to understand about product, clearly you understand about data science and you understand engineering as well. Things like API uptime. You’re concerned with all these different issues. What is the day to day like as a CTO of a very fast growing API focused data company?

Nate Fox: 35:52

It’s a very interesting question. Every day is a little bit different. I think especially the CTO role, it can kind of come in in a number of different ways. I think for me, the north star that I try to think about is what can I do to best scale and help grow the technology organization more broadly? And so there’s kind of three major buckets that I kind of see. One is hiring, right, building a great team. At the end of the day, great engineers, data scientists, product managers, et cetera, they want to work with other great talent, right? So meeting great talent and helping them understand the mission, what we’re trying to solve, that’s just one piece. Another piece is the technology organization itself, working with other leads within the organization to figure out is how we’re operating and building process together, is that effective? How are we doing sprint planning? How are we assembling in pods? What’s our process for building products? And as a company scaling in startup, we’ve worked really well at 10 people at 20, at 70, et cetera. It’s always changing. And so we’re always kind of thinking through how do we make decisions and empower the team to do their best creative work.

Jon Krohn: 36:57

Nice. And then-

Nate Fox: 36:58

Then the last-

Jon Krohn: 36:59

One more? Yeah, yeah.

Nate Fox: 37:00

Yeah, yeah. And I guess the last piece is just thinking of a technical architecture and the technologies we’re using more broadly and the product that we’re building specifically.

Jon Krohn: 37:11

Nice. Very cool. Yeah. And congrats on being able to grow at all of those stages. I can imagine that it’s quite a different situation. All of these things, whether it’s hiring or the organization of the organization for the technical architecture at each stage with your company of 10 people with maybe a couple of dozen clients to now being yeah, many dozens of employees and however many clients. It’s just constantly orders of magnitude more. You need to be adjusting rapidly all of these different aspects. So it sounds like probably a big challenge, but probably pretty exciting.

Nate Fox: 37:53

No, it’s a ton of fun. And I think what’s really makes it special is actually the people here at Ribbon Health. People here just, we have our values to the company that we take seriously and we really lean into. And so the people here, they’ll have to solve hard problems. They’re highly kind, empathetic and so it’s… Yes, it’s a bit chaotic and a lot’s happening, but it’s a lot of fun, especially when you’re building things with people that you enjoy working with.

Jon Krohn: 38:21

Nice, nice. So I imagine as CTO now of quite a large company, you don’t get to roll up your sleeves and write code yourself as much as you might like. But do you have any particular software tools that you think would be interesting for our audience to hear about?

Nate Fox: 38:37

Yeah, absolutely. And actually I’ll be inspired and think of the person I mentioned earlier, Sameer, who leads our data science function here at Ribbon Health. And actually the service that I’d mention is actually AWS Lambda, which is a serverless, event-driven compute service that AWS provides. And the way this works is you can basically build these functions on AWS in Lambda. And you can call these functions an API, once an event is triggered. And it’s allowed us to do two interesting things. The first is actually, it allows us to paralyze really, really compute intensive… the cluster work, actually, we mentioned in locations. And so every time we get a new location endpoint, we have to recluster that data at that address node. And so we can run a lot of these micro transactions or micro events and paralyze millions and millions and millions of them in a given day.

Nate Fox: 39:31

And the second thing is actually it sets the bedrock of which our data engineers, data scientists can actually interact with each other by having a landing function that they can call where they can say, “Hey, I scored this data point, tell us how accurate it is.” And then once that interaction node is established, data science can update and change their code and how things work with the core data pipelines still running as they do. So it allows for very seamless interaction between data engineering, data science.

Jon Krohn: 39:58

Super cool. Yeah. Lambda functions are great with AWS. So it allows you to have some function that you’d like to be able to call, but you don’t know how much you’re going to need to be calling it at a given moment in time. So having your own servers configured to handle highly variable loads can be tricky. And so this kind of serverless approach where you let AWS manage that for you, allows you to scale particular functions that you’d like to run to any size really rapidly.

Nate Fox: 40:30

Yeah. Exactly right. Exactly right.

Jon Krohn: 40:32

Cool, Nate. So clearly you all are doing some really interesting data science and engineering. And you alluded to this a little bit earlier that you have a value-driven culture. What is it exactly that you look for in the data scientists that you hire, the engineers that you hire? And you are doing some hiring right now, aren’t you?

Nate Fox: 40:52

Yes, yes, please. We have so many interesting, fascinating data science challenge problems out there. And I think the thing I’d stress actually is that our challenges are really driving value within healthcare. And so if you’re the kind of data scientist that wants to focus on helping patients find care as opposed to optimizing the next ad click, Ribbon Health is a great place for you to come to. But in terms of what we look for in hires, there’s two dimensions on a high level. One is tech… is aptitude for the role. So you know, is this person technically strong and excelling at… [inaudible 00:41:28] the aptitude for it, for doing it. We don’t only just look at have they done it before? We also look for aptitude. The second is values alignments. Here at Ribbon Health we have six core values that we created before the company was even founded actually because it’s not something you can just say, “Oh, these are our values,” and just saying them like in an organization. You have to kind of be intentional about it. At least that’s a way that we thought about it. And these six values are run towards hard problems is the first one.

Jon Krohn: 41:56

Run towards hard problems. Yeah.

Nate Fox: 41:58

The next one is putting your team first. Third is do what you say. And that doesn’t necessarily mean that you have to do everything that you say, right. But if you’re going to say you can do something and you can’t do it, let your teammate know, right. Be accountable and be transparent with when you commit to something. Another value, we stay hungry, keep improving. And this is the notion of no matter how well you’re doing or… there’s always room for being even better and improving yourself. The fifth is practicing habits of excellence. So we don’t celebrate necessarily just the outcome. We actually celebrate the process of getting there. Are we doing the right things to get to the right outcome? The process does matter. And then finally building empathy. So at the end of the day, it’s important to think what’s best for the patients. And so always when you’re working on these hard data problems, to always never lose sight of how this data and the technologies that we’re building are affecting real humans and patients at the end of the day.

Jon Krohn: 42:55

Nice. Super cool. So aptitude and then you want values alignment, and I love your six values. Running head first into hard problems, putting teams first, doing what you say, keeping improving, having habits of excellence and empathy. Really cool. Yeah. And it sounds like an amazing an organization to work for. Yeah. Having any kind of values-driven culture where the whole purpose of the company is to be streamlining healthcare, getting people the help that they need more quickly, I think. Yeah. I’m sure there’s a lot of people out there with whom that message resonates. So yeah, listeners get in touch with Ribbon Health. It sounds an amazing place to work. All right. So beyond Ribbon, what are other ways that engineers can make a big social impact in health tech?

Nate Fox: 43:47

Oh, there’s so many ways. The world of health tech is just absolutely exploding. I think there’s been 50 to $100 billion of investment in the recent last year or two just in this space of digital health. It’s really tremendous. We actually partner with a lot of these companies. And so I think if thinking through your interests, there’s just so much opportunity to create value. Especially these tech companies are very tech forward now increasingly. And so it’s a wonderful time to kind of jump in an industry that is so much craving innovation and utilization of more modern technologies. There’s just a lot of opportunity for creativity in terms of solving really hard problems that affect patients.

Jon Krohn: 44:29

Very cool. Yeah. It’s really exciting. An amazing time to be a data scientist. So many different kinds of problems we can be tackling. More and more data being collected from exponentially more sensors around the planet at any given time. Companies like yours that are verifying data quality, so now providing data scientists with high quality data that they could be building models with. Really, really yeah, couldn’t be a better time to be interested in data science and making a big social impact. So beyond health tech and the big social impact that people can make in that, another thread in your career in particular, Nate, is that you’ve been working in venture capital. So first at F-Prime Capital and then as an InSITE Fellow. So first, I guess the InSITE fellowship sounds like a super interesting program. So I’d love for you to let the audience know about what that is. And then what did those experiences at F-Prime Capital and InSITE afford you that helped you launch a startup so successfully yourself?

Nate Fox: 45:31

Yeah, absolutely. So both were incredibly helpful and informative. I guess, on the venture capital side, just understanding how investors think in terms of the companies they want to invest in and kind of what they expect to see and… really taught me just how to think about startups and venture in general. And if you’re going to start a company, how do you want to fit into the venture model and how to engage with it in an intentional way? On the InSITE fellowship, I really enjoyed that program. For context, every city… I think there’s a number of cities in the US that have different chapters. I know there’s one in New York, there’s one in Boston. I believe there’s also one in the Bay area. And what InSITE fellowship is basically a cross section of students that are in school that are a mixture of MBAs and law school students and software engineers as well come together to do pro bono internships for startups in the area.

Nate Fox: 46:28

And so while I was in school, I hadn’t fully been… I had met Nate and we were working on a number of ideas together, but earlier on, it was a great way for me to meet other founders and startups working on all kinds of problems. And then you would get a consulting project and kind of be able to dig your teeth into it and help them and sort of see what is it like to actually be on the ground and really understand some of the hard problems they’re trying to solve.

Jon Krohn: 46:51

Really cool. All right. Yeah. InSITE sounds like an amazing opportunity for people who are interested in those cities you mentioned and maybe other cities as well. And so yes, in addition to that venture capital experience, I see that you also co-founded a toy company, and that you have a passion for patenting educational toys. So what inspired you to work on that?

Nate Fox: 47:14

Yes. It’s one toy pad, and I think that’s something that one of my colleagues put on the website that I love to do that. So in college, my roommate, Mike [Lo 00:47:27], wonderful, amazing person that I love working with and still keep in touch with to this day, we actually took a toy product design class at MIT. Fascinating class in product design where… The aspect of designing a toy is kind of fascinating, right, because it has to be very appealing to the end user. But it also has to be very easy and relatively affordable to manufacture. So very interesting kind of design challenge to think through in terms of design thinking. So anyway, there was actually this concept that me and my colleague Mike had… or my friend, one of my close friends, he’s in my wedding party, but worked on where we said, “Hey, this was a really fun toy to build in class. We’re here in college. Let’s kind of experiment, can we actually monetize… can we build this and see people want to buy it?” Because kids loved it.

Nate Fox: 48:14

And what the toy was is actually it was a card connector that made making card casts really seamless and easy. There’s a interesting way we made the injection mold in terms of making this plastic piece and got manufactured in China and then built the pieces. And then we had met this wonderful lawyer who was helping MIT students at the time make patents pro bono. And so she kind of guided us to the process and we got it patented. And the very funny thing was actually that another company built a very similar product. It was exactly the same as ours. And they were interested in actually acquiring and buying our patents for a licensing perspective. And then we ended up selling it to that company, which was a very interesting experience for students.

Jon Krohn: 48:56

Yeah. Wow. Yeah. Tons of experience I imagine in that, and yeah, you can learn a lot about intellectual property and yeah, again, you’re making an impact. You are creating things that people really want to use. And so no doubt that also had some impact on your ability to design great products now later in your career. Cool, Nate. So it has been absolutely wonderful interviewing you. We are getting near the end of the episode, which means that it’s time for a question that listeners certainly are accustomed to. Nate, do you have a book recommendation for us?

Nate Fox: 49:33

Yes, absolutely. So there’s actually a book was something that my co-founder Nate recommended to me a long time ago that I really enjoyed. And it’s How Will You Measure Your Life? by Clayton Christensen. And it actually goes essentially into the importance of certain things in your life like your relationships. And it’s a fascinating book that kind of really asks you to take a step back and asks what actually matters in your life. And when you look back on it, how will you measure it? It has the title. It’s a fascinating book that really makes you reflect about what’s important.

Jon Krohn: 50:10

And the answer is dollars, right? Dollars is what matters at the end of life.

Nate Fox: 50:17

Depends on who you are, but yes. That can be an answer with some people, I guess.

Jon Krohn: 50:20

I guess so. Yeah. It doesn’t seem it’s going to be a very happy answer.

Nate Fox: 50:25

Definitely.

Jon Krohn: 50:26

Awesome. So clearly, Nate, you’re a brilliant guy. We’ve learned tons from you in this episode. How can listeners stay in touch with you after the episode?

Nate Fox: 50:36

Yeah, no. I mean, if you’re interested into learning more about Ribbon Health, you can email me at fox@ribbonhealth.com, as well as if you’re interested in any of our roles, please apply on online. And also if you want to use Ribbon, you could also ask for access on our website too.

Jon Krohn: 50:51

Nice. And do you do use Twitter, LinkedIn, or anything, social media people could follow you on?

Nate Fox: 50:56

Yeah, you can follow me on my Twitter @natethefox and as well as my LinkedIn as well.

Jon Krohn: 51:02

Nice. All right. Thank you so much, Nate. We’ll be sure to include those links in the show notes. Thank you so much for joining us on the program and maybe we can check in sometime in the future and see how your journey is coming along.

Nate Fox: 51:16

Absolutely. Jon, thank you so much for your many questions, and it was a pleasure and had a blast with you. Thank you so much for having me here.

Jon Krohn: 51:27

What a wonderfully informative episode? Today Nate filled us in on how Ribbon uses knowledge graphs, manually labeled data samples and an XGBoost model with hundreds of inputs to assign a confidence score of one to five to individual healthcare data points. He talked about how you design an API by talking to prospective customers and prioritizing features, the tremendous value in centralizing fragmented data, how Datadog and extensive stack tracing can help ensure API uptime and reliability and how AWS Lambda can be used to seamlessly scale API functions up to any number of users or calls.

Jon Krohn: 52:04

As always, you can get all the show notes including the transcript for this episode, the video recording, any materials mentioned on the show, the URLs for Nate’s social media profiles as well as my own social media profiles at www.superdatascience.com/561. That’s www.superdatascience.com/561. If you enjoyed this episode, I’d greatly appreciate it if you left a review on your favorite podcasting app or on the SuperDataScience YouTube channel. I also encourage you to let me know your thoughts on this episode directly by adding me on LinkedIn or Twitter, and then tagging me in a post about it. Your feedback is invaluable for helping us shape future episodes of the show.

Jon Krohn: 52:41

Finally, if you live in the New York area and would like to experience a SuperDataScience episode filmed live, then come to ScaleUp:AI, which will be held on April 6th and 7th. That’s ScaleUp:AI on April 6th and 7th. Many of the biggest names in data science will be there such as Andrew Ng, Allie K. Miller, Tomas Pfister and Will Falcon. Wow. I can’t wait to meet all these people. I’ll be moderating a panel on opensource software featuring Clem Delangue, the CEO of mega cool machine learning company, Hugging Face, which we expect to edit into a SuperDataScience episode. Should be tons of fun, and I hope to meet you there or somewhere else soon.

Jon Krohn: 53:21

All right, thanks to my colleagues at Nebula for supporting me while I create content for you. And thanks of course to Ivana Zibert, Mario Pombo, Serg Masís, Sylvia Ogweng and Kirill Eremenko on the SuperDataScience team for managing, editing, researching, summarizing, and producing another super interesting episode for us today. Keep on rocking it out there folks, and I’m looking forward to enjoying another round of the SuperDataScience podcast with you very soon.

Podcasts SDS 561: Engineering Data APIs

SDS 561: Engineering Data APIs

Podcast Transcript

Share on

Related Podcasts

July 10, 2026

July 7, 2026

July 3, 2026

Podcasts SDS 561: Engineering Data APIs

Share

SDS 561: Engineering Data APIs

Podcast Transcript

Share on

Related Podcasts

July 10, 2026

SDS 1008: The AI-Native Startup Playbook

July 7, 2026

SDS 1007: How to Find Solid Career Ground in the AI Era, with 80,000 Hours Founder Ben Todd

July 3, 2026

SDS 1006: In Case You Missed It in June 2026