Jon Krohn: 00:00
This is episode number 445 with Sinan Ozdemir, Director of Data Science at Directly.
Jon Krohn: 00:12
Welcome to the SuperDataScience Podcast. My name is Jon Krohn, a chief data scientist and bestselling author on deep learning. Each week, we bring you inspiring people and ideas to help you build a successful career in data science. Thanks for being here today and now, let’s make the complex simple.
Jon Krohn: 00:42
Welcome to the SuperDataScience Podcast. I’m your host, Dr. Jon Krohn, and I am so happy to be joined today by Sinan Ozdemir, one of the most clear and articulate explainers of complex concepts that I’ve ever met. Sinan was founder and CTO of Kylie.ai, a conversational AI company that was successfully acquired. He now leads data science at the acquiring firm, a San Francisco-based process automation startup called Directly, that has raised a whopping $67 million so far.
Jon Krohn: 01:15
He’s taught data science and math at a number of prominent institutions, including Johns Hopkins University and even in prison. On top of all that, in the past five years, he somehow managed to find the time to publish four books with a fifth one expected this year. Thanks to Sinan’s considerable breadth of knowledge and good natured humor this episode is flush with both facts and fun. We focus on how to design and engineer effective conversational AI algorithms, also known as chatbots, as well as how to successfully integrate them with businesses to truly automate processes.
Jon Krohn: 01:51
We also chat about why it’s so special that data scientists can act as machine learning engineers, as well the hard and soft skills you need to work on the cutting edge of natural language processing at a rapidly growing startup, what it’s like to teach university level math inside a prison, and the incredible usefulness of pure mathematical proofs for making real world decisions. We do get a little technical at some points. So, today’s episode may appeal primarily to data scientists who are interested in building natural language processing applications like conversational AIs, but there are also practical tidbits for anyone who would like to understand how chatbots can be effectively integrated into a business, to create efficiencies with automation.
Jon Krohn: 02:44
Sinan, welcome to the show. Where in the world are you?
Sinan Ozdemir: 02:49
I am right here. I’m right here in what is currently sunny San Francisco, California. Hasn’t been sunny for the last couple of days, but it’s currently sunny, so I’m happy about that.
Jon Krohn: 02:59
Is sunny a metaphor because your lockdown just lifted a little bit or you mean … Is there a literal sunshine in San Francisco?
Sinan Ozdemir: 03:08
There is literal sunshine right now, which is excellent, but it actually a bit of a metaphor. Yeah. Yesterday, California reopened a little bit more to outdoor dining and tattoo shops and haircut places are opened again.
Jon Krohn: 03:25
… which is how you have such a sleek-looking haircut on the show today.
Sinan Ozdemir: 03:29
I did, yes. I was one of those people who as soon as I got the email from my barber that, “We’re reopening on Thursday,” I said, “What is the earliest you have an appointment on Thursday that I can get a haircut?”
Jon Krohn: 03:41
Nice. Well, the people watching the YouTube version of the podcast can enjoy that great-looking haircut.
Sinan Ozdemir: 03:47
Oh, yes.
Jon Krohn: 03:50
So, you’ve been on the podcast before. You were on one of the first episodes. So, there’s been many hundreds of episodes. We’re now in the 400s, and you were on episode 21.
Sinan Ozdemir: 04:02
21? Wow!
Jon Krohn: 04:03
Yeah, which was January-
Sinan Ozdemir: 04:04
That was, what? Four years ago?
Jon Krohn: 04:06
That was four years ago, January 2017, and then you were back three years later in January 2020 for episode 333, working up to that 666 episode.
Sinan Ozdemir: 04:19
You listened to it twice.
Jon Krohn: 04:21
Yeah. Exactly. The second time, you get to hear the secret message from the devil. Now, we’re recording. It’s the final days of January 2021. This episode won’t be live in January. So, we’re ruining your January streak, but it is about a year since your last appearance on the program. So, tell us what’s been going on in your world? Prior to the last episode, your company had just been acquired. So, you had a startup Kylie.ai, and that was acquired by Directly. How’s that going? Do you hate them yet?
Sinan Ozdemir: 05:00
It’s going great. I don’t hate them. I do not plan to hate them in the future. No, it’s really great. Yeah. As you mentioned, we were acquired in September of 2019. I’ve been with the acquiring company Directly ever since as their Director of Data Science, and it’s been going very well. It’s really great to be a part of an acquisition where the acquiring company really cares about the vision of the acquired company and continues that vision as we incorporate into their culture. Culture is great. The people are great. That is one of the main reasons I’m staying, but it is also really nice to be recognized and acknowledged for the work that we did at Kylie and being able to bring that over to help the Directly platform as well.
Jon Krohn: 05:48
Beautiful. That sounds absolutely great. It’s like the dream of being acquired in that kind of situation where you get to continue on and continue to do the very work and have your vision. So, what do you guys do? I mean, we don’t want to talk about it too long because I know we’ve talked about that on previous episodes, but for people who haven’t listened, just a little bit on what you were doing at Kylie.ai and how that has rolled into the Directly work.
Sinan Ozdemir: 06:10
Yeah. So, Kylie.ai was focused more on chatbot and conversational AI generation, maintenance, and optimization. So, really, what that boils down to is we had enterprise-grade clients who would come to Kylie to help build, operate, and maintain conversational AI systems that could also interact with customers through our RPA, robotic process automation. So, the bot could basically actually do things for you, not just tell you how to do things.
Jon Krohn: 06:42
Oh, man!
Sinan Ozdemir: 06:44
Yeah, I know. When we went into Directly, Directly at the time was a lot more about building global networks of community experts for their clients. So, one of their clients is a global travel and hospitality company. They’ll find people around the world who have used their platform, used their system a lot, know the system a lot, and have them help to automate and optimize their customer support systems rather than having to have a bottleneck of people internally.
Sinan Ozdemir: 07:20
So, when we came on to Directly, we were bringing our conversational AI technology to build this pre-massive human-in-the-loop pipeline to not only build conversational AI systems for our clients, but also have humans and crowd networks in the background constantly monitoring, optimizing, and retooling that AI, so that we’re always catching the latest trends, updating our language, and making sure that the bots are always up-to-date, which is one of the big problems with bots is that they decay overtime. It’s like driving a car out of a lot, it loses value immediately. Once you build a bot, there’s going to be something new tomorrow that the bot doesn’t know about.
Jon Krohn: 08:00
Yeah. Like if you had a bot working in the healthcare system and it was trained a couple of years ago, it wouldn’t have the word COVID in its vocabulary, which is obviously going to be a big blind spot today.
Sinan Ozdemir: 08:14
Exactly. Yeah. That’s obviously a very well-known example of any company that has to deal with the ramifications of COVID. That word didn’t exist over a year ago. Now, all of a sudden, within a span of a couple of weeks for some companies, they had to not only learn what it was, but learn how to answer questions about it to their end users, and that’s where we help.
Jon Krohn: 08:36
Nice. So, I want to hear about what your day-to-day is like and what kinds of tools you’re using to build these amazing tools. That’s a really incredible thing to be able to be making that connection from the chatbot right through to the RPA. So, I can’t wait to dig in to that, but first, I want to discuss your favorite hobby. You’re an expert skateboarder, I understand. You’ve got the skateboard. We can see it in the video.
Sinan Ozdemir: 09:02
Yeah, I guess you can see it in the video. Well, I’m going to caveat that a little bit. When you say expert skateboarder, I definitely purchased a skateboard during quarantine hoping that I would be able to learn how to ride it very well, and that didn’t quite pan out the way I had planned. It turned out that I’m very bad at skateboarding, but I’ll often see it in my office and go, “Hey, maybe I’ll give it a try again today,” and then, “Nope. Still bad at it.”
Jon Krohn: 09:30
Nice.
Jon Krohn: 09:32
You may already have heard of DataScienceGO, which is the conference run in California by SuperDataScience, and you may also have heard of DataScienceGO Virtual, the online conference we run several times per year in order to help the SuperDataScience community stay connected throughout the year from wherever you happen to be on this wacky giant rock called planet Earth. We’ve now started running these virtual events every single month.
Jon Krohn: 10:01
You can find them at datasciencego.com/connect. They’re absolutely free. You can sign up at anytime, and then once a month, we run an event where you will get to hear from a speaker, engage in a panel discussion or an industry expert Q&A session, and critically, there are also speed networking sessions where you can meet like-minded data scientists from around the globe. This is a great way to stay up-to-date with industry trends, hear the latest from amazing speakers, meet peers, exchange details, and stay in touch with the community. So, once again, these events run monthly. You can sign up at datasciencego.com/connect. I’d love to connect with you there.
Jon Krohn: 10:44
All right. So, let’s talk about the tools that you are actually expert at. So, let’s talk about what you’re doing at Directly. I mean, I guess it would be interesting a bit to hear about having been the founder at Kylie and the CTO at Kylie how that changed as you joined Directly and what your responsibilities are now today, a little bit about your day-to-day, and then we’ll talk about the kinds of tools that your team is using right after that.
Sinan Ozdemir: 11:12
Sure. Yeah. Anyone who has worked for a startup of less than 20 people and potentially even having been a founder or a C-suite of said company under 20 people know that the shift from working at a company like that to working at a company of even over 50 people is pretty dramatic, I think, especially for myself who coming from the CTO/Founder position, my day-to-day job was whatever needed to happen to make sure that our company succeeded today, tomorrow and as far into the future as we could plan. So, that entailed making sure that our devops structure was not crumbling at the time, making sure our monitoring systems were up-to-date, and all these things.
Sinan Ozdemir: 12:04
Moving into Directly, I have been able to more focus myself on the data science AI machine learning side of things and rely on an extremely talented team of people to handle the things that I was trying to handle myself at Kylie, so the devops teams, the QA teams are all supporting the work that now my team from Kylie can focus on how do we implement the latest and greatest machine learning and AI, and move the needle a little bit in a direction that helps our clients and helps the community at large.
Jon Krohn: 12:45
Nice. Well, as much as devops and QA are important topics, given that we’re a data science podcast, I’m sure our audience is happy to hear that that’s what we’re going to be focusing on. So, yeah, tell us about that. What kinds of, obviously, without divulging any IP, what kinds of tools are you using? What kinds of models are you building? How do you link a chatbot to robotic process automation?
Sinan Ozdemir: 13:12
Yeah. So, I’ll touch on the pipeline that we deal with to make a bot useful because I think there’s a very big disclaimer going through people’s brains that bots aren’t that great. They don’t really actually solve your problem. Bots are good at getting me to a human, who’s going to then solve my problem for me.
Sinan Ozdemir: 13:35
So, I think one of the differences in approaches that we’re taking are I would say threefold. The first is we are constantly investing in our pipeline to maintain the AI. So, as we mentioned before, it’s not just building a bot, but it’s keeping the bot up-to-date, so we have a lot of structures in place to analyze raw text by trying to extract the latent structure underneath and do topic modeling, clustering to understand what are the different topics that are being discussed, and using those topics to incorporate into the bot through an entity architecture, bot responses, automatic responses. That entire pipeline is invested in very, very heavily. It’s not something that we do quarterly or monthly. We actually run this pipeline, in some case, daily.
Sinan Ozdemir: 14:35
So, we are daily looking for new topics that may arise. That seems to some people overkill, but when it’s early March of 2020 and these new topics are starting to crap up about this pandemic around the world, it matters. Catching these things as they’re happening matters. Obviously, that’s a pretty big scale, but even at the lower scale, understanding when there’s an outage or there’s some big fatal bug in some software, that won’t really last for weeks but may last for a few hours until it’s patched.
Sinan Ozdemir: 15:14
Those kinds of things are really crucial to patch for companies because they’re really correlated to people’s perception of the success of the bot. Does the bot recognize that there’s new topic in the world or is it just trained on the data from two years ago?
Jon Krohn: 15:29
Beautiful. So, do you use Python? What kinds of libraries are you doing to … What are you using to do this topic modeling and so on?
Sinan Ozdemir: 15:42
Yeah. Our main language in the data science team is absolutely Python.
Jon Krohn: 15:46
Surprise.
Sinan Ozdemir: 15:47
Surprise, right? I know. It’s this new hot thing called Python. So, yeah, the Python is the main language that we utilize in our data science team. We use it both as a scripting and as a production level backend code. So, all of our data scientists are not just machine learning engineers, but they’re also contributing to the actual backend architecture to support the machine learning, which I think is really crucial. I think data science being a combination of math, computer science and domain expertise, I think it’s important to understand yourself and in which of these three areas you are the most comfortable in, but also strive to try to be a little bit knowledgeable in all three of those area.
Sinan Ozdemir: 16:37
If I can build a great machine learning model, it would also be great if I could also understand the systems that support serving the model, and versioning the model, and understanding why all of that is important. I think that’s also crucial to the data science journey.
Jon Krohn: 16:53
I totally agree. I think that that is the ideal, and it’s nice that you have that. I think it’s also probably enjoyable for your machine learning engineers to be able to full stack like that and to have the experience of training models. I think people get a lot of enjoyment and fulfillment out of that. So, it’s nice that they can work up and down the data scientist stack.
Sinan Ozdemir: 17:13
Yeah. I think it all comes down to, I mean, no one wants to feel like they’re working in a vacuum, right? I don’t think people really enjoy the feeling of being on this team where they’re expected to be given a problem, find the data, process the data, do the feature engineering, do the modeling, spit out a model and say, “Well, I guess I’m done. Move on to the next one.” I think people really enjoy understanding, “Well, why are we doing this? What happened before? Why are you asking us to do this? What is the purpose of this model? Is this supposed to help a KPI? Is this supposed to be an internal/external model? What’s the reasoning for this?” all the way to, “Now that we’re done, where is it going to live? Let me tell you what architectures will use and therefore how big it is in megabytes. How much memory they’re going to need to process this? How fast does it predict things in batch and how many can it do in a single batch?”
Sinan Ozdemir: 18:08
All of those may seem like small things, but they really matter if the person who’s going to use that model at the end is saying, “Actually, I need to process 10,000 data points at a time.” They go, “Oh, this can’t do that.”
Jon Krohn: 18:25
Yeah. That’s beautiful. So, you mentioned to me before we started recording that you had some particular use cases from your work that you might like to share. Have we covered those already or-
Sinan Ozdemir: 18:37
I started to. We can get into a little bit more.
Jon Krohn: 18:38
Yeah. Let’s dig through it. Definitely.
Sinan Ozdemir: 18:40
Yeah. I mean, obviously, on everybody’s mind is the COVID-19 pandemic. As it relates to in the early stages, again, in March of 2020, it became really obvious to customer support experts and customer support experts being account managers, being conversational designers, machine learning engineers that there’s this new topic to understand, that the bot needs to understand or else the bot will be grossly out of date.
Sinan Ozdemir: 19:15
I think a lot of people stop there. They say, “COVID is a thing. Let’s write a response. Put out a response. Done.” We did it. We’re now automating it. However, it then became very obvious that underneath this umbrella of COVID, depending on which domain you’re in, there are these nuanced topics. So, an example for us would be a hospitality company, some are dealing with travel.
Sinan Ozdemir: 19:40
People who like to travel, obviously, got a bit of a shock when they realized, “Oh, we can’t do any of those travelings in pretty much all of 2020. So, we need a refund. We need to figure this out. What are the next steps here because no one can travel?”
Sinan Ozdemir: 19:58
Then it becomes an issue of, “Well, where were you going?” Right? So, you’re getting into these nuanced subtopics where, “Were you going to leave the country? Were you going to go to a hotspot of COVID or not a hotspot of COVID?”
Sinan Ozdemir: 20:14
Hotspots of COVID changed by the week, sometimes by the day. So, understanding all of that is more nuanced than just, “Did you say the word COVID? Here is our one-pager on how we are dealing with COVID.”
Sinan Ozdemir: 20:26
Our system, as I mentioned, we’re daily looking for new topics, is able to understand COVID, but also recognize these subtopics of there’s just large category of people going to Disney World, and they are talking about Disney World and COVID and the rules about Disney World are very different than the rules about traveling to Italy in that time. So, understanding the differences between what city are you going to, why are you going there, when were you going to go, all of those subtopics become crucial to the company. If they’re not able to understand that latent structure of the conversations, then they’re going to find a lot of people who are frustrated, who say, “I understand that you know what COVID is, but I have a very specific problem and I need your help with it.”
Jon Krohn: 21:18
Nice. That makes a lot of sense to me. That is definitely critically important. All right. Thank you, Sinan. I love that use case. It is so important, and it shows what it’s like to be working as someone in your position where you’re needing to be managing these conversational AIs and have them be effective for all of your clients.
Jon Krohn: 21:42
Another topic that is I think really precious, something that comes up a lot in conversations, we’ve been talking about it a lot in recent podcast episodes, is this idea of AutoML. So, do you find that AutoML is helpful to you and to your team? Do you think that there’s potential in it today or in the future?
Sinan Ozdemir: 22:07
Yeah. That’s a big question. AutoML as it is today is not a technology that we are leveraging heavily at Directly or at all to be totally transparent. So, the idea of AutoML being that, you give your data to this system, and its black box more or less system, and the system says, “I got this,” and spits out a model, and sometimes does even the more leg work of, “I’m serving it for you. Here’s an API if you want to hit it and get results back in realtime, and I’ve already tuned all of the parameters that I am aware of to optimize your model. So, you’re done.”
Sinan Ozdemir: 22:49
I think that idea is excellent. I love that idea. The idea that you could give your data to a system and that system can figure it out for you, it sounds great. The problem comes in when it becomes the defacto alternative to data scientist machine learning engineers because there’s this notion that, “Oh, machine learning was difficult, and you had to be an expert, a PhD in Math to understand it, but now we have, SageMaker, AutoML. Insert AutoML technology here. That can just do it for us.” Thank goodness we can finally use that.
Sinan Ozdemir: 23:33
I don’t think that’s the right shift. I don’t think that’s the big shift that we’re seeing as a trend, but I hope that doesn’t become the big shift because there’s always nuance. There’s always nuance in either the domain knowledge, i.e., this data isn’t fair, this data isn’t clean, this data isn’t good enough for any model to be able to interpret, so let alone an AutoML system that doesn’t have any humans in the loop. Recognizing that alone is difficult.
Jon Krohn: 24:06
Totally. I think we get this idea. I think part of why AutoML solutions might seem attractive to people is you say, “Machine learning engineers, data scientists, these are expensive positions to fill.” So, if you’re in the C-suite looking at opportunities to maybe reduce some cost, have some efficiencies or maybe even just do more with your existing resources, you say, “Okay. I’ve heard about AutoML, I’ve heard about SageMaker,” and so they say to the CTO or the director of data science, “How can we get this involved? How is this going to save us money? Does this mean that we can do more with the same amount of headcount?” It’s these issues that you’re bringing up that I … I mean, there’s this idea that some decades in the future maybe none of us need to be doing any work at all, which I don’t know how likely that really is. Maybe in our lifetimes, I don’t know.
Jon Krohn: 25:03
The idea that you can be replacing data scientists I think in the coming decades with things like AutoML, for the reasons you’re saying, I think it’s a stretch, especially if you’re going to be working at the cutting edge putting systems into production like you are. So, bias, you mentioned, things spurious associations that are in the data that, as you mentioned, are difficult even for a trained human expert to notice to a machine that isn’t likely to look spurious. It just is-
Sinan Ozdemir: 25:37
Useful.
Jon Krohn: 25:38
Yeah, right. Exactly.
Sinan Ozdemir: 25:39
Great. These things are correlated. I can use that information.
Jon Krohn: 25:42
Oh, use the ID number.
Sinan Ozdemir: 25:46
The art of correlation.
Jon Krohn: 25:48
Exactly.
Sinan Ozdemir: 25:48
At the same time, there are cases where I am comfortable giving data to a machine to help me build a model. So, for example, a very specific case in my field of NLP, which is where I spend most of my professional time, career, is in tech modeling, the idea that there’s a text classification day-to-day. Text goes in, label comes out. Now, a label happens to correspond to a topic or an intent of an end user in a bot, but at the end of the day, it’s text classification.
Sinan Ozdemir: 26:24
Now, I am comfortable relying on third party tools to build an intent model for me, think things like Dialogflow as a good example. Dialogflow has a intent model feature that you can use to leverage and host and build an intent model. However, they require training data. So, the hard work is still on the human to identify which intent you want, get the training data, clean the training data, make sure your intents aren’t overlapping. You have reset password Windows and reset passwords Mac.
Sinan Ozdemir: 27:07
Those training phrases are going to collide, and it’s going to be difficult for the machine in some occasions to tell the difference between those two intents. Only a data scientist would think of that or a bot designer or someone experienced with intent models, but you can’t trust the model builder alone to understand that. So, there’s a trade off, but I think the hard work still has to be done by a human today.
Jon Krohn: 27:32
So, making sure that I’m understanding this properly, intent modeling is where you’re trying to understand the intent of your user, right?
Sinan Ozdemir: 27:40
Exactly. So, it’s usually as simple as text classification, a text utterance goes in and then a label comes out. More complicated features will associate with hierarchical, so you can give several terms of a conversation, back and forths of a conversation and label the intent, but at the end of the day, usually, it’s text goes in, intent comes out, yeah.
Jon Krohn: 28:03
Nice. Okay. I get it. I get it. Then so the idea is that a machine on its own can be very good at identifying these patterns like password recovery, I think was the example you gave, but that something system-specific be mindful of what the operating system is. Is this something that we might need to nudge an algorithm in the right direction or maybe even hard code in a way to say if a particular operating system comes up, that is something that needs to be treated very special and it’s highly indicative of the topic that the person is interested in, the intent that this person has with the system.
Sinan Ozdemir: 28:39
Yeah. A lot of the times, it’s a secondary model, where you’re trying to extract what are called entities from the text. So, you have the intent, which is reset password, and then you have your entities, which would be your operating system. A bot would then have to use both of those things to be able to give you the best response, but the point is if you rely solely on the intent modeling system as an AutoML saying, “Here’s all the data we came up with,” which is in and of itself difficult to get all that data and compile it, you shouldn’t just trust the model to know the best way to separate out into these different intents. You have to measure the model. You have to constantly optimize it or move training phrases, add training phrases, remove intents, add intents. So, there’s always a large human-in-the-loop component to that.
Jon Krohn: 29:32
Beautiful. So, when you are building your systems with the kinds of NLP expertise that you have and that you use everyday, what are you looking for in the people that you would like to work with? What kinds of skills do you want in the people that you hire? If somebody wants to be an NLP expert like you are or be building conversational AIs, getting involved with robotic process automation, what should they be doing? How can they develop the hard skills or the soft skills needed to work alongside you?
Sinan Ozdemir: 30:08
Sure. Yeah. It’s a great question. It’s a great question because I don’t get to talk a lot about the other side of the skills that I personally tend to look for when considering people to join my team because it’s not always about, “Do you know machine learning pipeline structure? Do you know how to formulate a machine learning problem? Do you know how to evaluate a machine learning problem?” Those are all extremely important. Don’t get me wrong. I look for those. We look for those.
Sinan Ozdemir: 30:40
What I’m also looking for is, especially, and again, I’m speaking as someone who has worked mostly in smaller companies less than 100 people, one of the skills I look for the most is, “Can you teach me something that I don’t already know?” That, to me, answers two questions. One is I’m not good at everything. I can’t be. No one can be. So, I’m looking for someone to supplement a part of knowledge in the world that our team currently doesn’t possess. So, when I say me, I mean my team. Can you teach our team something that we don’t know? That will show us that you’re bringing something to the table that we didn’t really consider or we don’t know too much about, and I think that’s crucial.
Sinan Ozdemir: 31:35
The second thing it considers is how good is this person at explaining what’s in their head out loud to a team of people and then does the team of people then know how to take action on what that person said. So, what I sometimes do is I’ll organize a panel of different team members from across different fields in my company, so machine learning, conversational design, analysis, pure backend engineering. I’ll prompt the interviewee in advance, and I don’t do it on spot, and I’ll say, “I’m hoping you can create a two, three slide deck and I would love it if you could spend 15 minutes explaining either your process on how you solved our take home problem or explain a topic that you deal with at work or your hobby that you’re just passionate about, interested in, as it relates to machine learning and data science.”
Sinan Ozdemir: 32:38
We say, “Keep in mind, the people that you will be talking to are not all machine learning engineers. Some will be. So, can you explain your work to a machine learning engineer and also a non-machine learning engineer in the room at the same time?”
Sinan Ozdemir: 32:55
That skill, to me, is crucial. As someone who used to be a teacher, I find that skill to be really crucial because it, for me, correlates to how quickly can you and I get to a problem solved together because I’m going to be trying to explain something to you, you’re going to explain something to me, and we have to understand each other and iterate on each other’s work to be able to get to the problem being solved. That for me is a big correlation to, “Can you teach me something, please? I’m going to ask you questions to make sure I understood what you taught me.”
Jon Krohn: 33:30
Beautiful. So, there’s two tiers to this, and correct me if I’m wrong on understanding this, but the first tier with this kind of assessment is that you’re looking for somebody who has some kind of domain knowledge that complements your existing teams, and then the second tier is that not only do they have that knowledge, but they can impart that knowledge effectively to a broad range of people, which not only demonstrates they have the domain knowledge, but also demonstrates that they’re an effective communicator.
Sinan Ozdemir: 34:02
Exactly. Again, this is all caveated with this is extremely crucial at smaller companies, where every person you bring on is vital, every new person you bring on to your team is vital. It becomes so important to understand how well are we going to work together and exactly as you said, are you an effective communicator? Are you bringing something to the table that we previously had not even considered in some cases? I find myself on interviews writing down what they’re saying and saying, “Circling, I never thought of that. That’s really interesting.” That’s the kind of light bulb moment I look for, and those are the people that I’m most excited about potentially bringing on to the team.
Jon Krohn: 34:49
Beautiful. I’m not going to get into the detail, but I do something slightly similar with my interviews. So, I totally appreciate the way we’re doing it, and I love it. So, speaking of teaching and being a good teacher, you yourself have quite a bit of background at teaching. So, you’ve taught at General Assembly in San Francisco and Washington, D.C. You’ve taught at Johns Hopkins as a TA and an adjunct lecturer. You’ve taught at Goucher College in Maryland, and you’ve taught in prison.
Sinan Ozdemir: 35:21
I have, yes. Yeah. I was teaching in Maryland Prison Systems for about two years. It was actually a program through Goucher College, where we were teaching courses for college credit. So, you’re actually able to offer a college credit to the people taking the course. It was actually, if I’m being honest, it was actually one of the best teaching experiences of my life, even considering Johns Hopkins and boot camps like General Assembly in that, yes, I have taught a wide variety. I even used to teach online high school, online AP Calculus and Statistics for a while. Yeah. That was my full-time job.
Jon Krohn: 36:07
So, if I remember correctly, most of these roles, so, okay, AP Calculus, AP Stats, and then I think a lot of these jobs like General Assembly, you’re teaching analytics and data science and this kind of thing.
Sinan Ozdemir: 36:21
Yes.
Jon Krohn: 36:23
What were you teaching in prison?
Sinan Ozdemir: 36:26
So, as part of the prison program, I was still teaching Math. That was my main domain.
Jon Krohn: 36:31
Cool.
Sinan Ozdemir: 36:32
I was teaching, I mean, AP Calculus. I taught AP-
Jon Krohn: 36:36
Wow!
Sinan Ozdemir: 36:37
Yeah. I was doing AP Calculus, Level Calc I, obviously, in college, Calc I and Calculus classes, Algebra classes. It was arranged over a couple of years, but these are college courses that we were offering credit for. So, I’m teaching Calc I the same way I would teach Calc I in any university.
Jon Krohn: 36:59
Wow! Okay. So, there’s internal programs at the prisons where they’re studying towards a degree as well?
Sinan Ozdemir: 37:07
Absolutely, college degree.
Jon Krohn: 37:08
Wow. Cool.
Sinan Ozdemir: 37:10
Absolutely. Yeah.
Jon Krohn: 37:13
This is long before COVID, so you’re actually going in and just lecturing in a classroom. It just happened to be-
Sinan Ozdemir: 37:20
Yes. Absolutely. Yeah. A while ago, this is about I guess five-ish years ago, a little bit more now. So, it’s been a while, but, yeah, it’s totally in-person. I would drive to the prison a couple of times a week for office hours and the actual classes. I’d go in and teach a classroom. I had my board. We had books, and I taught Calculus, yeah.
Jon Krohn: 37:46
Wow! That’s cool. That opens my eyes to something I just didn’t know went on, and I really appreciate it having my eyes opened to that.
Sinan Ozdemir: 37:55
From what I remember, it was one of the few institutions that offered this level of courses outside of getting a GED, which I think is fairly common. So, it was one of the few systems where the incarcerated folk could take courses in an effort to get college credit and actually get a degree and use that degree. So, it was a very rewarding experience. I think about it often and I actually really miss it.
Jon Krohn: 38:27
Wow! That’s cool. Again, for our listeners outside of the US, GED is a high school equivalent diploma, right?
Sinan Ozdemir: 38:37
Yes. Yes. It’s a one step beyond.
Jon Krohn: 38:41
Yeah, yeah, yeah. So, not maybe the most common thing in prison, but not uncommon and I can only imagine a rewarding experience for the incarcerated folks and super interesting to hear that even for you, one of the most enriching teaching experiences.
Sinan Ozdemir: 38:59
Yeah. Obviously, I’m not an expert in criminal justice. I am simply someone who really loves Math and teaching. When I was given the opportunity to teach Math in a new environment to people who really wanted to learn, I just jumped at it and the students were some of the most … They were willing to learn so much more than some of the other courses I would teach between middle school all the way to university. So, it was just everyday I wanted to be there and I was just really happy to be able to do that.
Jon Krohn: 39:37
That’s beautiful. So, related topic, you’re talking about teaching Math and Calculus. So, your formal training, your formal education is in pure Math. So, do you want to tell us a bit about what you’re looking at in Pure Math, how that contrast with applied Math and how despite data science being an almost entirely applied Math area, how the pure Math that you learned is useful as a data scientist?
Sinan Ozdemir: 40:08
Sure. I think the debate between pure Math and applied Math comes up a lot, especially when you’re someone that comes from that background. They always ask, “Why didn’t you do applied Math?” or to the applied people, “Why didn’t you do pure Math?” Honestly, every time I hear that debate or conversation, in the back of my mind I’m going, “We all like Math. I don’t know why we’re fighting about it. We all like Math, right? This is why we’re here. I don’t understand why we have to divide over this.”
Sinan Ozdemir: 40:38
I guess at the end of the day, the main difference is, at least how I experienced college at Johns Hopkins, was the applied Math pathway was focused more on statistics, probability, hypothesis testing, these kind of methodologies as opposed to the pure Math track, which, obviously, I know more about because I was on, focused on proof solving and how to break down the rules of Math to its very core and then build it back up to understand new concepts in mathematical thinking.
Sinan Ozdemir: 41:21
So, how do you take the idea of integers, one, two, three, four, five, break it down to its rules and then try to apply those rules to another system or a number system and see if it still works. I think there’s a loss, and to your point, of data science being more of an applied Math technique. I think that is true today, but I think there’s a lot of corollaries from abstract pure Math that can also be taken into account in machine learning, the idea that there are … Calculus is often considered both pure and applied Math because, again, it’s all Math. Calculus is a huge component of understanding the inner workings of deep learning as is Algebra, which, again, can be considered both pure and applied Math. So, there’s a lot of takeaways from both tracks in data science.
Jon Krohn: 42:21
Beautiful. Yeah. So, proof solving and becoming competent with that can be useful across the board in your data science career and maybe even just in life in general.
Sinan Ozdemir: 42:33
Yeah. I think so. The ability to understand where you’re starting from, point A, see in general where you’re trying to go, point B, and figure out iterative logical steps to get from A to B, and not just figuring out the steps, but being able to adjust those steps as you acquire more information is such a combination of the proof solving in pure Math with the hypothesis updating prior posterior knowledge in applied Math, right?
Sinan Ozdemir: 43:05
I can come up with this proof, but as I go, I’m going to have to adjust to actually get there along the way as I get new information. So, it’s really a combination. It’s all Math to me.
Jon Krohn: 43:16
I like that. So, I’m doing a lot of thinking of this and I’m like, “All right. I really want to go on a date with this person,” and that’s the point I want to get to. I’m going to prove how I can get there, even if-
Sinan Ozdemir: 43:29
Yeah. If I’m being honest, this is maybe a different podcast. I’ll try a different technique.
Jon Krohn: 43:35
I can Math my way onto those dates that I can’t get.
Sinan Ozdemir: 43:38
I haven’t been able to figure it out.
Jon Krohn: 43:42
Nice. Well, bringing some Math into real life, I mean, maybe not exactly a dating example, but you love Math so much. I mean, maybe that should be obvious to our listeners by this point that you have several Math tattoos.
Sinan Ozdemir: 43:56
I do, yeah. I do.
Jon Krohn: 43:58
Do you want to tell us about them or maybe I can even get you from YouTube viewers to show some them off.
Sinan Ozdemir: 44:05
Sure. Can’t show all of them off, unfortunately, but, actually, we just talked about proofs. One of my more recent tattoos, I think I can get to it. Here it is.
Jon Krohn: 44:17
Oh, nice.
Sinan Ozdemir: 44:19
It’s a reference to my favorite proof, which is a pretty simple proof for those who are on the Math track. It is the proof that the square root of two is irrational. So, never ending decimal never repeats, and the proof is a type of proof called contradiction, which is one of the main styles of proof, which is you assume the end result is actually false. You assume the contradiction, and then you arrive at a falsehood. So, in this case, what if the square root of two was rational, and then you try to work your way to a contradiction, which is why it’s called proof by contradiction. I think that a lot of the times it is how I think about a lot of problems, both professionally and personally, is, “Okay. I want a proof that …” I’ll make up an example. I’ll try on the spot.
Sinan Ozdemir: 45:16
I want to show that this bot isn’t working, let’s say. I’ll try to be as vague as I can. Well, let’s assume it was working. What would I expect to be true if it was working? Well, I’d expect the precision score to be above 80%, okay? Well, let me go check the precision score. It’s 60. The bot isn’t working. That’s a very, very generalized example. It does influence a lot of the ways that I think of a problems. Okay. Well, let’s assume the opposite and let me see if I can arrive at a contradiction, and that will hopefully enforce my belief about the problem at hand.
Jon Krohn: 45:56
Oh, I love this. I mean, I come from an applied Math background. So, I haven’t done a lot of proofs, and hearing you talk about it, it makes me really excited about it. It’s like this whole new world that I’ve got to explore. I can see a great value in it.
Sinan Ozdemir: 46:09
Yeah. Honestly, it doesn’t take a pure mathematician to appreciate a lot of these proofs. If you go on YouTube and just look up square root of two irrational, you’ll find several two-minute, this is not a long proof. It is about 10 lines of Math, and they’re all pretty followable. There’s two-minute videos that people just walking through it, “Here’s how it is,” on a back of a napkin, “Here’s why the square root of two is irrational, and that’s it. Have a good day.”
Sinan Ozdemir: 46:36
So, you’re going to appreciate these proofs having the intricate knowledge of the set of rational numbers and what that all means. You just will appreciate it, I think. Yeah, and the other ones are references to Game of Life, Automata. There’s a bunch.
Jon Krohn: 46:52
Nice. Well, that’s cool. I mean, it was great to take a deep dive into one of your tattoos. Maybe next year when we have you on the show again, it seems inevitable that you’ll be on the program again soon.
Sinan Ozdemir: 47:00
One tattoo a year, yeah. That’s a promise.
Jon Krohn: 47:02
Exactly. That sounds good. All right. So, that’s beautiful. I love what we’ve covered today. We always end the program with the same question. So, what books are you reading right now or what books do you recommend for our audience?
Sinan Ozdemir: 47:23
Same question every time. I am more prepared than ever.
Jon Krohn: 47:27
Same answer every time. No?
Sinan Ozdemir: 47:28
I’m more prepared than ever this time.
Jon Krohn: 47:31
Nice. Wow! We can see it on YouTube. We see exactly what book you’re talking about.
Sinan Ozdemir: 47:34
Yeah. So, it’s called Designing Voice User Interfaces. It’s an O’Reilly book. The subtitle is actually I think more indicative of what it’s about, The Principles of Conversational Experiences. This is actually a book recommended to me by a colleague. She’s an extremely talented conversational design, bot architecture design, intent modeling expert. She recommends this book to a lot of people, especially machine learning engineers dealing with bots, chatbots, conversational AIs because-
Jon Krohn: 48:05
What’s her name?
Sinan Ozdemir: 48:06
Lauren Senna. She is an excellent person, excellent colleague. She recommends this book because it doesn’t just talk about the actual machine learning behind everything, it more gets into the design of the conversation, why the channels matter, why it matters, who your end users are, how to talk to different end users. I mean, don’t get me wrong. There’s sections on advanced natural language understanding, but there’s also sections on context and why it’s important to keep track of context, vulgarity and how do you deal with that.
Sinan Ozdemir: 48:40
It’s a lot of these principles about how to have an automated conversation, which going back to the beginning of the episode, it’s not just about your hard skills of can you build a model, can your AutoML build you a model. There’s a lot of things to keep in mind that, for now, only a human can really do.
Jon Krohn: 48:58
Nice. Well, your book recommendation is not only one that I highly value because that sounded super interesting, and it sounds like a very nicely well-written book, but also because you’ve written four books since 2016. So, in the last five years, you’ve got four books come out and I heard a rumor you might be working on a fifth. Do you want to divulge any of that to our audience?
Sinan Ozdemir: 49:19
Yeah. I am working on a new book in 2021. My goal is to finish it up, get it out there by the end of 2021, but it is a feature on … Well, it’s a feature engineering book.
Jon Krohn: 49:35
A feature on features.
Sinan Ozdemir: 49:37
A feature on feature engineering. It is effectively a book of case studies that I’m working on that will be working through different feature engineering ideas, different domains, image, healthcare, NLP, stock trading even though stock trading is very volatile, but there’s lessons to be learned in the time series data that it offers. So, working through different examples.
Sinan Ozdemir: 50:04
The core of the book is there’s a whole world of data science before we even introduced your supervised machine learning model, your classification, your regression, your name it, to the whole world, and that world is getting data, evaluating its fairness. There’s evaluating the features that are redundant or that are dependent on other features. There’s interpolating missing data. There’s this whole world of work to do before you call your dot fit method. So, the book really focuses on that hidden world of feature engineering, and it tries to do so by working through different examples, but working on it this year, and I’m excited to hopefully get it out by the end of the year.
Jon Krohn: 50:50
Amazing. Yeah. It is such an important topic. I agree 100%. It doesn’t get enough focused on books. So, I think there’s a great market for this. I can’t wait to read it. So, for people who are interested in knowing about your future book launches or anything else you’re working on or maybe just starting a conversation with you, how can our listeners contact you, follow you, find you?
Sinan Ozdemir: 51:09
Yes. I’m not that active on Twitter, but I do have a Twitter, Prof_Oz. People can find me on the traditional social media, LinkedIn and Twitter, obviously. I don’t currently have a homepage, but that is something I also hope to change in 2021.
Jon Krohn: 51:31
Nice. That should be easier than writing a book.
Sinan Ozdemir: 51:34
It’s not great podcast material, sorry. I don’t have a website, but I will one day, I promise. Please, obviously, find me on LinkedIn and Twitter. I am active enough that I will read something that you write and, hopefully, find time to reply.
Jon Krohn: 51:54
Nice. Sinan, I’ve learned so much. Beautiful to have you on, and can’t wait to have you on again soon.
Sinan Ozdemir: 52:00
Thanks so much for having me. I look forward to our fourth time together.
Jon Krohn: 52:05
Perfect. We’ll see you then.
Jon Krohn: 52:12
Well, there you are. We sure were lucky to have Sinan visit us for a third time on the podcast. Our primary focus this time around was on the effect of design and engineering of conversational AI with practical tips such as updating the chatbot’s vocabulary and topics on a daily basis to ensure it’s up-to-date on the terminology of the latest world events, and linking the chatbot to robotic process automation, so that it’s actually doing something in the real world such as increasing efficiency within your business.
Jon Krohn: 52:42
We also talked about how useful it can be if machine learning engineers are able to perform the role of a data scientist, thereby working all the way up and down the data science stack from modeling through to deployment, how AutoML is unlikely to be taking data scientist jobs away in the foreseeable future, how essential it is in a relatively small company to bring your own expertise, your own unique expertise to the table and to be able to communicate that expertise to people outside of your domain.
Jon Krohn: 53:11
We talked about incarcerated people being especially eager students of university level Math. We talked about Sinan’s Math equation tattoos, and specific examples of how the mindset developed for solving mathematical proofs comes in handy both with a data science career, as well as in life in general.
Jon Krohn: 53:30
As always, you can get all the show notes including the transcript for this episode, the video recording, any materials mentioned on the show and URLs to Sinan’s LinkedIn and Twitter profiles, as well as my own LinkedIn and Twitter profiles at www.superdatascience.com/445. That’s www.superdatascience.com/445.
Jon Krohn: 53:50
When you add us on LinkedIn, it might be a good idea to mention you are listening to the SuperDataScience Podcast so that we know you’re not a random salesperson. If you enjoyed this episode, kindly leave a review on your favorite podcasting app or on YouTube, where you can enjoy a high-fidelity video version of today’s program. In it, you can see the smiles and laughs that we had today and you can also see Sinan’s skateboard, his chess table, and his tattoos.
Jon Krohn: 54:17
I also encourage you to tag me in a post on LinkedIn or Twitter to let me know your thoughts on this episode. I’d love to respond to your thoughts in public and get a conversation going. All right. It’s been so great. Thank you for listening. Looking forward to enjoying another round of the SuperDataScience Podcast with you very soon.: