SDS 509: Accelerating Start-up Growth with A.I. Specialists

Podcast Guest: Parinaz Sobhani

September 28, 2021

We talk about what Parinaz looks for when hiring, the ethics of AI, we dig into her academic research—specifically her PhD in AI with a concentration in natural language processing, and more!

About Parinaz Sobhani
Parinaz Sobhani is the head of Machine Learning and applied research on the Georgian R&D team and is responsible for leading the development of cutting-edge machine learning solutions for growth-stage startup companies. Parinaz holds a Ph.D. from the University of Ottawa with a research focus on solving natural language understanding problems using deep neural networks techniques. She has more than 10 years of experience in developing and designing new models and algorithms for various artificial intelligence tasks. Before joining Georgian, Parinaz worked at Microsoft Research where she developed end-to-end neural machine translation models. She has also worked for the National Research Council of Canada, where she developed deep neural network models for sentiment analysis.
Overview
Parinaz works for Georgian, which she describes as a company trying to disrupt the investment industry by investing in B2B SaaS companies. They assist companies post-investment to accelerate growth and have product differentiation. Start-ups don’t have the monetary resources to update their human resources and progress in the same way a large company like Amazon and Google can. That’s where Georgian helps.
Parinaz came to the company as the first machine learning hire and worked primarily with code before getting her hands on data. The result was figuring out their work is not about consultation, it’s about collaboration and partnership—educating and augmenting the teams of their client companies, who by the way, they don’t charge for service. It’s rewarding to make a company’s success your success and to be impactful for companies solving important and complicated problems for the wider world of consumers. They offer product strategies, hackathons (what they call “deep engagements”), and both help implement machine learning and taking companies that already utilize it to scale their work. The trick is helping a company pressed to become profitable fast when they may not have time to tackle this more technical and nuanced work. Importantly, Georgian never works with competing companies.
As far as tools and approaches, Parinaz and Georgian stay on top of the libraries and tools coming out in the open-source community. One of her favorites is Hugging Face. I personally love Hugging Face, which is an incredible group that comes out with state-of-the-art tools almost daily. They utilize the usual suspects: Python, TensorFlow, PyTorch, and others. They even open source some of their own code and work. From there we got into the ethical concerns of utilizing large computational models and what that can do to the environment. The good news is, reducing carbon footprint is not only good for the environment but also cost-effective for the company. BERT, a model that I love, is smaller and is an example of models that lower their own carbon footprint for much of the same work and output of larger and more energy-draining models.
In hiring, Parinaz is looking for individuals who are motivated by impact. Is it what drives you actually solving the problem or are you simply interested in the technology? Parinaz looks for honesty and people who won’t get caught up in utilizing technology to solve problems. The word is pragmatic. She also wants quick learners who can jump between markets and quickly understand the industry without handholding. Obviously, the basics are engineering and tech skills to create good code as well as communication skills and the ability to be effective and influential in your communication skills. This was a great segue into what Parinaz’s why is in her work. She fell in love with code in high school as a way to solve problems. In her collegiate and PhD work, she discovered machine learning and was excited to learn that her coding work could be automated.
We closed out with a discussion on fairness in AI, something Parinaz is extremely interested in promoting. We know now that models, like humans, can be sexist or racist because it is human-driven data that is building them, making it impossible to get an objective source of truth at scale. When the main ingredients are biases, your optimization can copy this depending on your objective. Parinaz is excited that we can move the needle thanks to current research in debugging and study of root causes of biases. Even something as simple as being transparent and honest about the biases or blind spots of a model.

In this episode you will learn:
  • Parinaz’s work at Georgian [5:35]
  • Use cases of Georgian’s work [14:35]
  • Tools and approaches Parinaz uses [32:27]
  • Environmental concerns of machine learning [42:52]
  • Hiring at Georgian and what Parinaz looks for [48:18]
  • How did Parinaz become interested in this? [56:19]
  • Fairness in AI [1:09:01]
  
Items mentioned in this podcast:
Follow Parinaz:
Follow Jon:
Episode Transcript

Podcast Transcript

Jon Krohn: 00:00:00

This is episode number 509 with Parinaz Sobhani, head of machine learning and applied research at Georgian. 
Jon Krohn: 00:00:12
Welcome to the SuperDataScience podcast. My name is Jon Krohn, chief data scientist and bestselling author on Deep Learning. Each week, we bring you inspiring people and ideas to help you build a successful career in data science. Thanks for being here today, and now let’s make the complex simple. 
Jon Krohn: 00:00:42
 Welcome back to the SuperDataScience podcast. Today, we’re joined by the game-changing machine learning practitioner and AI expert Parinaz Sobhani. Parinaz is head of machine learning and applied research at Georgian, a financial technology company that is the largest private investment fund in Canada. They make venture capital and private equity investments. Georgian leverages data and machine learning not only to drive their investment decisions but, revolutionarily, to provide the companies they invest in with top-tier AI experts to collaborate on high-impact projects. This isn’t consulting. It’s a service. It’s a perk provided by Georgian to the 40-odd companies in its investment portfolio, each of which are themselves well-funded high-growth tech startups in a broad range of industries from insurance to law to real estate and dozens more. 
Jon Krohn: 00:01:37
In today’s episode, Parinaz details how Georgian’s innovative AI special ops approach works with several fascinating real-life examples. In addition, she fills us in on the tools and techniques her team leverages, with a particular focus on efficient and powerful techniques that combine transformer-based models of natural language with transfer learning. We talk about what she looks for in the data scientists and machine learning engineers she hires. We talk about the ethics of AI, such as environmental and socio-demographic considerations. And we dig into Parinaz’s academic research. She holds a PhD in AI from the University of Ottawa where she specialized in natural language processing. Today’s episode is somewhat technical so may appeal especially to practicing data scientists, but we put special effort into breaking down technical sections so that hopefully anyone can get the gist and learn a lot. Today’s episode will also appeal greatly to anyone interested in accelerating the impact tech startups can have by augmenting their core team with external AI specialists. All right. You excited for this episode? Let’s get to it. 
Jon Krohn: 00:02:54
Parinaz. Welcome to the SuperDataScience podcast. I am delighted to have you here. You’re such an interesting person, and I can’t wait to dig into the work you’re doing now, the work you’ve done in the past. It’s all so fascinating. Parinaz, tell us. Where in the world are you calling in from today? 
Parinaz Sobhani: 00:03:13
Hi, Jon. Nice to meet you, and thanks for having me. I’m calling from Toronto. 
Jon Krohn: 00:03:21
Yeah. We talked about this briefly just before we started recording. So I was born and raised in Toronto and … Or, as people in the rest of the world pronounce, Toronto. The most populous city in Canada. And I hadn’t been here in a couple of years because of the pandemic. I couldn’t travel from New York to Canada. But the borders opened up a little. We’re filming early September, and the borders opened up a couple of weeks ago. I was able to finally come home and visit my family in Canada. And so, I’m actually like an hour drive away from you in Waterloo, and had we coordinated better we could have shot in person. That would’ve been fun. 
Parinaz Sobhani: 00:04:01
Definitely. I wish we had done it, but that’s okay. 
Jon Krohn: 00:04:04
Yeah. 
Parinaz Sobhani: 00:04:04
We all got used to it. 
Jon Krohn: 00:04:07
It’s all right. We got used to doing everything … Exactly. And yeah, it doesn’t take away from what is going to be an incredible episode. So I know you through Maureen Teyssier. So Dr. Teyssier was on episode 479. An extremely interesting episode. We talked about her work in industry. We also talked about her PhD research, which is similar to what we’re going to end up doing with you on this episode today, Parinaz. So I know Maureen from being in New York from … I used to run a … I guess, I still run. But because of the pandemic, I haven’t been doing it. This deep learning study group and Maureen was a regular contributor. She did some presentations there. She was pretty much always attending and a really valuable contributor to this group. She’s somebody I respect so much. And so, when after the program she said, “I’ve got this friend, Parinaz, that you’ve got to speak to. I think she’d be a great guest on the show.” She sent me over your profile. I checked you out online and I was like, absolutely. Right away, I’ve got to have Parinaz on the show. So how do you know Maureen? 
Parinaz Sobhani: 00:05:16
That’s actually very interesting because it’s also related to what I’m doing right now in work, and we’re … Yeah. Do you want me to tell you a little bit about where I am, what I do, and how I [crosstalk 00:05:32]? 
Jon Krohn: 00:05:32
Yeah, exactly. I mean, we might as well. We were going to get to it at some point anyway, so let us know. Yeah. Tell us about where you work and what you’re doing. Of course. 
Parinaz Sobhani: 00:05:40
I work for Georgian, and we are a FinTech company disrupting investment. So we invest in B2B SaaS companies. We are more later stage investor. Normally series BC. And Maureen used to work for one of our companies, Reonomy, and that’s how we met. So Reonomy is one of the examples that we have around 30, 40 other startup companies in our portfolio. And I’m head of machine learning and applied research, which is interesting because I don’t have investment background and I’m definitely not working on the investment fort. So our team, and our main differentiation, is about how we help our companies post-investment, and what’s our value add. Our value add is around the RND team or applied research team that we have, and we help our companies really to accelerate their growth. 
Parinaz Sobhani: 00:06:38
But it’s mainly around how we can help them to have product differentiation by using cutting edge technologies. So if you think about Amazon or Microsoft or Google of the world, they all have access to talent. They all have access to those research labs. And that’s how they can afford innovation and research. Normally, for startups, it’s not affordable. So that’s my team responsibility to really stay pretty much up to date at what’s going on, what’s relevant, what’s not relevant, and how our companies can leverage these technologies to really solve the problems at scale, or have some sort of product differentiation in their competitive markets. 
Jon Krohn: 00:07:21
That is so cool. So Georgian is the name of the company, which I’d love to know … Do you know why it’s called Georgian? 
Parinaz Sobhani: 00:07:30
Why it’s called Georgian? Yeah, because we are very close to Georgian Bay. So I guess our funders love Georgian Bay, and it’s definitely one of the most beautiful parts of Ontario. So I guess that’s why it’s Georgian. 
Jon Krohn: 00:07:44
Yeah. That’s true. It is. It’s about a two hour drive north of Toronto, and then all of a sudden you’re in this incredible cottage country with endless lakes. Lots of people have islands that you can only … They’ll have a cottage on an island that you can only get to by boat or hover plane maybe, even. That’s the fancy way. I couldn’t agree more. A lot of Hollywood celebrities will buy property around Georgian Bay and that area because it’s some of the most beautiful place you can be. 
Parinaz Sobhani: 00:08:16
Yeah, beaches. Yeah, exactly. Beaches in Ontario. So that’s how they call it Georgian. It’s been around for the last 13 years and we are the biggest private fund in Canada right now. But of course- 
Jon Krohn: 00:08:31
Really? 
Parinaz Sobhani: 00:08:32
We are the most proud of our differentiation and our disruption in this industry. We also use machine learning, AI, and data science to automate and augment our investment processes. And that’s how we also started our journey to be more analytic, to be a more ML and data driven company. And then, we decided why not using these technologies? And why not applying the same expertise and helping our companies as well? 
Jon Krohn: 00:09:04
Wow. Okay. So this is super cool. 
Jon Krohn: 00:09:09
This episode is brought to you by SuperDataScience. Yes. Our online membership platform for transitioning into data science and the namesake of the podcast itself. In the SuperDataScience platform, we recently launched our new 99-day data scientist study plan. A cheat sheet with week-by-week instructions to get you started as a data scientist in as few as 15 weeks. Each week, you complete tasks in four categories. The first is SuperDataScience courses to become familiar with the technical foundations of data science. The second is hands-on projects to fill up your portfolio and showcase your knowledge in your job applications. The third is a career tool kit with actions to help you stand out in your job hunting. And the fourth is additional resources such as articles, books, and podcasts to expand your learning and stay up-to-date. To devise this curriculum, we sat down with some of the best data scientists, as well as many of our most successful students, and came up with the ideal 99-day data scientist study plan to teach you everything you need to succeed. So you can skip the planning and simply focus on learning. We believe the program can be completed in 99 days and we challenge you to do it. Are you ready? Go to SuperDataScience.com/challenge, download the 99-day study plan, and use it with your SuperDataScience subscription to get started as a data scientist in under 100 days. And now, let’s get back to this amazing episode. 
Jon Krohn: 00:10:40
So I mean, the part that you just told me, that’s the obvious part to me. And that’s the part that I would’ve guessed. If you told me Georgian is the biggest private fund in Canada and we have people here who are experts in AI and machine learning, the obvious thing to me would be, okay, what that team does is it helps Georgian figure out where to invest. The second part, which was actually the first part you told me, but the second piece where your AI ML teams actually augment the companies that you invest in and allow them to have this differentiation, this cutting edge capabilities through applied AI research. That’s amazing and I’ve never heard of anyone doing that before. Now it seems obvious, now that you’ve mentioned it. I guess that takes … I mean, that takes a lot of investment on Georgian’s part, because people who are AI ML experts like you are not cheap, and so to even have a team in house helping with investments, that’s expensive. But then to say, we’re going to go another step further and have an even bigger team so that we can help our portfolio. That takes, I imagine in the beginning, before that was a proven strategy, that takes guts. Though I’m not surprised that it has worked out as a proven strategy. 
Parinaz Sobhani: 00:11:54
You are right. We were the very first [BC 00:11:58] who had a hands-on RND applied research team. That’s the case. There was nobody else to look after and look into, but … And it was a journey. It was lots of experimentation, figuring out. And I was the first machine learning hire, so and I’m so proud of our journey and what we achieved. So I started and I was still very hands on. I love code. I love to see data. I love to play with data. And I started to have some kind of advisory, meetings and calls and relationship with our companies, and it was like, it’s impossible. I need data. Give me data. And then we started to experiment with this new model of how we can be more hands-on, how we can get closer and closer to the code and what they built. And it worked. 
Parinaz Sobhani: 00:12:53
Maybe not in the first or second project, because we failed a few times to really figure out that it’s more about collaboration. We are not a consultancy shop out there. We have a very special relationship with our companies. We don’t charge our companies. And it’s all about collaboration, partnership, and augmenting, and even educating and up leveling their teams, which is very special. Nobody else is willing to teach you all these secret sauce because they want to keep charging you. So that’s why it’s been a great journey. Working very closely with startup companies also has been very rewarding for me because I feel it’s very impactful. These startup companies are solving very, very interesting problems and often they don’t have access to the right talent or they can’t even afford the right talent to really solve them using more innovation, using more cutting edge technologies. So that’s why I’m so excited for the role and for what we are doing. 
Jon Krohn: 00:14:03
That is so cool. And I can imagine that it is exciting and getting to work on all these diverse problems and getting to work alongside people like Maureen, if that’s your typical counterpart, if you’re dealing over the course of your week with various Maureens at various high growth startup companies, all tackling different, exciting problems. Wow. That is really an exciting role, Parinaz. I am super jealous. You don’t need to tell anyone, but I’m so jealous. All right. So on that note, are there any particular use cases that you’d like to talk about that are interesting? 
Parinaz Sobhani: 00:14:41
Yeah. So over time we tried to tailor our offerings for companies in different stages of their maturity. So ideally, we are going to work with more advanced companies. We want to also make all our companies pretty much advanced in the area of machine learning data science, but they might be earlier. So how we can really help them to build the team. How we can really help them to identify opportunities, have some strategies, product strategies. And then, my favorite part and our team favorite part is what we call hackathon or deep engagements, where we really go very, very deep. We’ve worked for a few weeks or a few months or a few quarters with our companies and really help them to deliver and execute on the plan and accelerate that execution. 
Parinaz Sobhani: 00:15:31
So maybe I can give you an example. One of our companies, Tractable, is a computer vision company. They are in the auto insurance market, enabling insurance companies to really augment and automate processes and really make it easier for every single one of us who has accidents. So if you have an accident, normally it’s a very long and tedious process. You have to call your insurance many times, and then the body shop and getting the receipts. So it’s a very tedious process. How we can really accelerate and also automate and augment the process as much as possible. So they are defining a new process, which is mainly about you’re taking a few pictures of your car and then sending it over. They’re going to process those images and they’re going to give you an estimate of how much the insurance needs to pay. It’s all about the automation of insurance, body shop, and you as someone who had an accident. They are moving to new markets, to new problems. Check out their website and they are also hiring. They’re always hiring for best talents. They are hiring in US and Europe. 
Jon Krohn: 00:16:51
The company was called Tractable? 
Parinaz Sobhani: 00:16:54
Tractable. 
Jon Krohn: 00:16:55
Yeah. T-R-A-C-T-A-B-L-E. Tractable. 
Parinaz Sobhani: 00:16:59
T-R-A-C-T-A-B-L-E. And so, Tractable. They are interested in moving to new markets. Imagine that, for example, their customer base mostly in APAC or Europe, and now they are considering to moving to North America. And most of these models that they already built are not going to perform really well in the new market, in the new geographic locations, because their [inaudible 00:17:29] place processes are different from one geographic location to another geographic location. And also the car types and models, they are also very different. So that’s why it’s a problem because it can delay onboarding new customers and it can delay the revenue that can come from these customers. It’s also not ideal for their customers. 
Parinaz Sobhani: 00:17:52
So really, how we can solve this problem using some sort of machine learning and mainly transfer learning techniques, because it’s not going to be the case that everything else that they build or all the data they’ve collected so far is completely irrelevant. It’s just a matter of which of them is relevant. So that’s how it’s more definitely … Tractable is one of the most advanced companies we have, but it was mostly around really figuring out the latest research. And given that we already had some expertise in transfer learning, working with other companies in this area. How we could help them to quickly come up with techniques that can enable them to quickly onboard customers in new geographies. And of course, reduce the cost because it’s also about the cost that you’re going to have. If you want to build completely a new model from scratch and you need to collect lots of data and annotate and label lots of data. 
Jon Krohn: 00:18:52
Beautiful. So let’s take a moment to fill in what transfer learning is for listeners who aren’t aware. So transfer learning is where we take a model, typically a big model that has already been pre-trained on a lot of data, and then we fine tune the model weights to some specific data set that is highly relevant to the client. I don’t know. You can probably dig into some particular details, obviously without going into anything proprietary, but … So a common thing that a company would do in a computer vision case, like maybe in Tractable’s case, you can take a model like AlexNet, which is this deep learning model that is specialized in general machine vision problems. It’s been around for almost a decade now. And there’s this data set called the ImageNet data set that has tens of millions of images that are labeled. So some of those images that says, “This is a car.” Other ones that says, “This is a cat.” And those labels are for tens of thousands of different categories. So you can train AlexNet to be generally good at vision tasks by training it on the ImageNet dataset, just as an example. And then you can take that giant machine vision model architecture that’s had the advantage of all of this diverse training images, and then fine tune it to some specific problem. And that is called transfer learning. So I don’t know if there’s any more detail you want to go into there, Parinaz, related to Tractable. 
Parinaz Sobhani: 00:20:26
Yeah. We also publish a few blog posts about this project. So I encourage everyone to check them out. But what we also use was what we call ensemble learning, and it’s also about, okay, you have these models … And it’s much more complicated than a single model because you have models for each part of the car. You have models for the [inaudible 00:20:49] place, paint, or I don’t know … There are all these kinds of things that can happen in the body shop. And then it’s also about object detection, really figuring out which part of the car it is, and it’s really the combination of many models. And it’s also about, you have these models for different geographic locations. How we can really combine at model level as well. 
Parinaz Sobhani: 00:21:12
So what you mentioned is really … One of the base we tried and we got some improvements from that. Other base is thinking about how we can smartly combine these models or features or getting their predictions into coming. Or a stack model and is combining all these models in a stack modeling way, but it works really well. And that’s why it’s not … There is no straightforward approach to solve the problem. It’s first of all really digging into the problem, really understanding their current stack and the current process, the current workflow. And then understanding what they have, what they have collected. And then digging into the research and really figuring out what’s relevant to their problem, trying and experimenting. That’s why it takes time. And that’s why most of the time startups, they don’t afford, because they are really under the pressure of delivering. 
Jon Krohn: 00:22:07
Yep. So I’ll get back to that practical applied thing that you just mentioned in one second, but so that … What you were talking about there in general, when you opened up that discussion, you were talking about ensemble learning, which is just … The word ensemble in French is together, and so it’s this idea of having models work together. And so, you mentioned one way you could be stacking models together. So the output from one becomes the input to another. So you could have multiple models creating inputs for a model downstream, so we could stack them in that way. Or even at the same level, in a stack of models, you could have multiple different approaches that combine their results and get results that are better than any individual model in that layer of the stack on its own. It sounds super cool, and it does definitely sound complicated. It doesn’t surprise me that you sometimes need to be working for months or quarters on these deep dive hackathons with your clients to get great results, because I imagine even the data set sizes could be so big that even just- 
Parinaz Sobhani: 00:23:13
Oh, yeah. 
Jon Krohn: 00:23:13
Yeah. Sharing the data just at first, and then really digging into their problem and figuring out how all the different parts that you need in the stack are going to work together. It sounds really fun. It must be nice to be able to get that done and see that this portfolio company of yours can move forward and have this big differentiating aspect. And so, on that note, as you mentioned right at the end of the last time that I let you speak … I’ve now been speaking for so long. But the last thing that you said when you were last speaking was talking about this idea of how startups are so pressed. Startups, often they’re not profitable, so I’m sure some of your companies in your portfolio, some of them probably are profitable, but some probably aren’t. And when not profitable, there’s this amazingly intense pressure to be delivering on RND that generates revenue. And this can be tough when you’re talking about the kind of problem that you described, ensembles of models, huge data sets, all these different layers of the stack, all of which need to be functional. And all of those pieces need to be in place maybe before the whole thing can generate any revenue. So that puts a lot of pressure on a small startup’s RND function, and so I can see the value in having having you guys as the investment company understanding that process and being able to contribute resources. It’s so cool. 
Parinaz Sobhani: 00:24:52
Yeah, definitely. You are 100% right. It’s all about they are always under resources. They have to keep delivering and growing. And most of all markets are so competitive. Every single day there is a new feature. There is a new product that your competitor has, and now you have… There is a pressure on you also build the similar products, build similar functionalities. So definitely so much pressure on teams to deliver. And then, at the other side of the spectrum, you have these problems that they are definitely high value. We normally build ROI cases for every project we work on because we are, of course, technologists. But we have to be super pragmatic and practical because we have also pretty small team and we have to help more than 40 companies. So we normally build ROI [inaudible 00:25:46] and we normally make sure that we’re really going to work on the projects that can move the needle at the business level. 
Parinaz Sobhani: 00:25:52
But at the same time, they are high risk. That’s the, I guess, common sense. Most of the time, these high value projects, they are also high risk. And that’s why, because of the risk level, sometimes our companies deprioritize these types of projects. Or they are looking someone who has better understanding or has some sort of expertise to, again, to risk the project. Or even tell them, is it feasible? Is it not feasible? Or how in a very short period of time we can study the feasibility. 
Jon Krohn: 00:26:22
Totally. Yeah. That’s another really great use case that can even … I’m sure there end up being circumstances there where you can help them figure out what to prioritize, what might be feasible, how risky things are. If there’s 40 companies in Georgian’s portfolio, no doubt there will be circumstances where you have to say, “We can’t take the time to get involved in this multi-month project with two people from our team.” But you can point them in the right direction and help them out with how feasible things are. Because no doubt amongst those 40 companies, some of them probably do have amazing machine learning leaders who can do that for themselves, like Maureen. But no doubt there’s some others where they don’t have that much in-house expertise. Maybe they have an engineering team, but haven’t yet hired data scientists. And so, they think, well, I think there’s this opportunity for a model, but I don’t really have modeling experience. And they can come to you. 
Parinaz Sobhani: 00:27:21
Yeah, you are right. There are several factors when we are prioritizing the projects. One of them is what we call commonality or reusability. So if there is a problem that they can help you with, but there is four other companies that they can leverage the similar technologies, similar code, then it’s a win-win for all four of you. And of course, we don’t invest in competitive companies. So none of our companies are competitors, so that’s why we have this community of ML and data science teams that we can help them to work together, work with us. And we can be that bridge to help them fill those gaps, and transfer the knowledge. So that’s why that’s been one of our top priorities, so always think about commonalities. Think about reusability, and scaling because in a venture capital world, there is no value to double your team every year. We don’t measure the growth by the number of people we have in our organization. So it’s always about how we can be more efficient, how we can scale, how we can also make sure that you have this community of companies that they get together and solve these types of hard problems. 
Jon Krohn: 00:28:43
Cool. That’s a whole other layer of this amazing model that Georgian has that I didn’t know about, and that is so cool. It’s obvious now that you mention it, but hadn’t occurred to me, is that by having commonality, reusability, scalability of the work that you do, it can end up having this effect for your whole community of portfolio companies. Man, yeah. That’s, as I said, really obvious now that you mention it. Really powerful and really cool. 
Parinaz Sobhani: 00:29:11
Yeah. Yeah. I told you, nobody else has done it before. And then, it was a few iterations. But that’s what’s really unique about our culture that I’m, again, so proud of. That culture of experimentation, always trying to disrupt and challenge ourselves by, what’s the next things for us? How we can really deliver more value for our companies while staying very efficient, while having a lean team. How we can think about what we call a toolkit, which is more like reusable codes and reusable code components that we can use in the context of multiple projects. Then there is, of course, IP, and then there is all these legal aspects of it. How we can figure it out, how we can make it very transparent. 
Parinaz Sobhani: 00:29:54
That’s why we also open source most of our toolkits. Why? Because then it’s going to be very clean state for most of our companies, no matter if they are in our portfolio or not, that still they can use that because it’s open source. There is no limitation in terms of the code we use in the context of the other projects. It’s not easy. So it’s very exciting, but then there are all these legal trust issues that we have to deal with. But I think we are in a very good situation right now that we figured out most of these kinds of nuances, and now we have open source toolkits. And it’s also giving back to the community. So it’s not only about our companies. If there is another startup company out there dealing with a similar problem, under resourced, they can leverage our toolkits and open source libraries as well. 
Jon Krohn: 00:30:43
Yeah. This sounds hugely powerful and impactful. I had a conversation … So in 2020, I piloted a podcast called the Artificial Neural Network News Network, A4N. And it was hosting Kirill Eremenko, who at the time was the host of the SuperDataScience podcast, was having him as a guest on that A4N that led to me getting to know Kirill really well and me ending up eventually becoming the host of this program. But back last year, I interviewed someone named Dr Rasmus Rothe who is in Berlin, and he created something called the AI campus. And so, this AI campus, it’s a similar idea of having many companies that can leverage a common set of software libraries. Like you said, a toolkit so that companies working in different areas… They have a healthcare machine vision company that’s detecting tumors. They have another machine vision company that is building self-driving car algorithms for car companies. 
Jon Krohn: 00:31:50
And so, two completely different use cases. Their markets don’t overlap. Much like your portfolio companies aren’t competitors. But these are two different machine vision companies and there’s a lot of different things related to transfer learning, ensemble models, that are common. And so, you can accelerate the self-driving car company through the research that’s happening at the tumor detection company. So I totally understand what you’re getting at. I think that kind of approach is really rare. You’re only the second person that’s ever mentioned it to me, but I can imagine how it’s so useful for your portfolio companies. So speaking of toolkits, are there particular tools or approaches that you and your team, and maybe even in association with the portfolio companies that you usually use? 
Parinaz Sobhani: 00:32:43
Very good question. So we normally try to really stay on top of open source community and really figuring out what’s coming out, because especially in machine learning and data science there are so many great open source libraries. So it’s just really making sure you are aware of all happening there, because, for example, one of my favorite is hugging face. And every single day… Maybe it’s an exaggeration. Every single week there is new models, new features, new functionalities that’s coming out of that team. We normally, of course, most of the people in machine learning community, they use Python and the scikit-learn for [inaudible 00:33:24] word classic ML problems, for more deep learning type of problems. We use TensorFlow, PyTorch, Keras. We are pretty flexible because we have to really adapt to what they have been using, and our companies take a stock. But it’s pretty standard right now. It’s Python. Most of the time TensorFlow, PyTorch as it relates to more deep learning techniques. Or Hugging Face types of more open source libraries that you can build on top of that. 
Jon Krohn: 00:33:56
Nice. 
Parinaz Sobhani: 00:33:56
Yeah. And we also open source. 
Jon Krohn: 00:33:59
Oh, you open source some stuff yourself? 
Parinaz Sobhani: 00:34:01
Yeah. 
Jon Krohn: 00:34:02
Oh, that’s cool. 
Parinaz Sobhani: 00:34:03
We also open source. Yeah. 
Jon Krohn: 00:34:05
Nice. And so, many of these open source tools will be familiar to listeners. Python, the most popular machine learning language, period. And then, within that, libraries like TensorFlow, PyTorch, and Keras that are highly flexible that allow you to quickly build deep learning models, particularly in the case of Keras. But then with TensorFlow and PyTorch, you can really get into the nitty gritty with any kind of machine learning model and do very subtle, unique things like your idea of stacked models and ensembles. With TensorFlow or PyTorch graphs you can really have all of those pieces work together as part of one single computational graph. So super powerful. The tool that I want to talk about that I don’t think we’ve mentioned on the show before, but is so cool, is Hugging Face. So yeah, they are one of the coolest companies. I have a data scientist on my team, Grant. He is in love with Hugging Face. He talks about them all the time, because, as you say, it seems almost every day they are coming out with these state of the art things that they often open source and make available. And particularly, going back to what we talked about earlier in the conversation with transfer learning, they make transfer learning so easy. So I don’t know if there’s anything more there that you want to dig into. 
Parinaz Sobhani: 00:35:28
Yeah. We’ve worked with one of our companies, [Disco 00:35:33]. They are in a legal tech space, really helping lawyers, any discovery, figuring out of all these documents, I guess, thousands of documents that they have to deal with and really finding the right materials. So finding what is even relevant, what’s not relevant. What is junk, what’s going to help them with their case. And it’s a classification tagging problem, but how we can really leverage pre-trained models, how we can really leverage cross-matter data aggregation. If it’s possible to really pre-train model, and then of course you can fine tune them or domain adaptation. 
Parinaz Sobhani: 00:36:12
That was another piece that we’ve been thinking about, and we open source our toolkit around domain adaptation as well, because, okay, you are dealing with these … As you mentioned, it’s in the context of the presentation learning and transfer learning that, yeah, these pre-trained models are available for you, but they’ve been trained on completely different genre or topics. And it’s really, it’s like a human that understand the common sense, but when it gets more technical or specialized they have no idea. So how you can really fine tune them and adapt them to your topic, to your problem. And that was very unique for one of our companies, Disco, I think. And now it’s open source, you can find it in our website as well, or our GitHub repository. 
Jon Krohn: 00:37:04
Oh, cool. 
Parinaz Sobhani: 00:37:04
And you can also use it to really enable you to quickly to adapt and fine tune these models based on your own data and your problem. 
Jon Krohn: 00:37:16
Oh, nice. So the thing that you open sourced in collaboration with this portfolio company Disco … Fun name. Is a tool that makes it easier to transfer learn from pre-trained models. Is that right? 
Parinaz Sobhani: 00:37:31
Yeah. Exactly. Yeah, exactly. One of you can take any of those pre-trained models … These models are super big, super large. Even if no other reason, at least for environmental reasons, you don’t want to train these big models by yourselves. But at the same time, you want to get utility out of them because they are very powerful. So really, how we can take one of those off the shelf pre-trained models … And Hugging Face normally open sourced most of these pre-trained models. And you can try them. You can take one of those pre-trained models and use it for any classification, [inaudible 00:38:07] question answering tagging problem. Just get it out of … It’s very easy to use Hugging Face, for example. But then, you might not get the similar utility if you have a very specific domain. For example, legal domain, which is very specific. It’s very different from, for example, news or Wikipedia documents that these big models has been trained on. 
Jon Krohn: 00:38:26
Exactly. 
Parinaz Sobhani: 00:38:27
And then, how we can really adapt it to your own domain and fine tune it to really get more utility out of these pre-trained models. That is specialized, of course, for your problem. 
Jon Krohn: 00:38:38
Beautiful. I’m going to try to say it back to you and summarize what you just said, but it sounds … Yeah. It sounds like something really useful for listeners here. So Hugging Face allows you to, out of the box, as Parinaz just said, make use of pre-trained models. But these pre-trained models are going to be pre-trained, like you said, on Wikipedia articles or news, or maybe a scrape of all of Google sometimes. And that’s hugely general. And if you have some specific use case, like you’re in the legal space, like Disco is, then you’re going to need to fine tune the model to this new data set, this specialized data set. And so, very cool that you guys at Georgian have open sourced some tools to make that process easier. We will try to find that and put it in the show notes. Sorry, I spoke over you. Go ahead. 
Parinaz Sobhani: 00:39:40
No, no. That’s, I think, maybe one more point around … Even if you don’t want to use those pre-trained models, you have enough data to actually train a BERT model. I guess that was the case for this was, well, they have access to millions of millions of millions of these documents that they even don’t need to use any pre-trained models. But still, you have a new matter, you have a new case that you don’t have enough training data for that one. So definitely, you can’t afford to train a BERT model from scratch. Then how you can really leverage … It’s also in the context of transfer learning and data aggregation. Cross customers, cross matter, whatever works for your case based on your data rights. If you can really train a model or you have existing models that you have already trained them for another matter, for another customer, for another client, and then it’s a slightly different from the new use case, from the new classification problem. How you can really tune it for your own, for this new use case. And definitely, it’s going to be data efficient, because it doesn’t need to … You don’t need to have access to as many training data, which is not only cost efficient, it’s also … It’s not only data efficient, it’s also cost efficient because of the computation costs that you have to spend to really train these models from scratch. And it’s also good for environment because now everybody knows about carbon footprint of all these data centers. 
Jon Krohn: 00:41:13
Yeah. I mean, if people don’t … So we should definitely break down. So you mentioned BERT. So this is an example of a transformer architecture. So these became famous in the last few years in natural language processing. So in these kinds of situations where you’re working with legal documents, you’re going to want to use a natural language processing model. Today, almost every time you’re going to pick a deep learning model to do that, and the state-of-the-art model architecture to use are these transformer architectures of which BERT, B-E-R-T, is one of the most well known. 
Jon Krohn: 00:41:48
And so, you can have … If you buy a consumer-grade graphics processing unit … So GPU’s can dramatically speed up the training of deep learning models as opposed to just trying to train them on the CPU of a machine that you have. But on a consumer-grade GPU, the BERT model might take so much of the GPU’s memory that you can only fit one or two training samples, one or two documents in the RAM of that GPU at any given time. And so, prior to transformer models, you might have been able to fit hundreds of examples at once. And so, this just gives you an idea of the size of these model architectures. They’re so big. And it means that the compute to train them can be huge. So not only is that expensive for you, if you’re training it from scratch without using one of these pre-trained Hugging Face models, but the environmental point. 
Jon Krohn: 00:42:48
So you said, everybody knows … Maybe not everybody in the audience does know. There was a really famous paper … I guess it came out almost a year ago now. But stochastic parrots? Yeah. I knew you were going to know about that one. And that’s a really … I highly recommend checking that out. It’s a hugely controversial paper for reasons that I don’t know if we really have the time to dig into right now. But it led to people leaving Google under circumstances where Google and the people leaving have different stories about how that departure happened. But yeah, so the stochastic parrots paper. Interesting for social reasons, but also it talks a lot about how these huge transformer models are bad for the environment. 
Parinaz Sobhani: 00:43:35
You are right. Now everybody talking about carbon footprints and we are all concerned about global warming and environmental problems we are going to deal with for the next generations, or even our generation. We are dealing with everything. I think at least for me personally, it was a trigger because we always encourage our companies, especially in area of machine learning, to use deep learning techniques, to use bigger models, to use … That they are definitely computationally heavy and intense. And it was a trigger for me to really give it more thought and really thinking about how, again, we can leverage technology to reduce our carbon footprint of using these models. 
Parinaz Sobhani: 00:44:20
And that’s why transfer learning is very exciting for me, because it can enable you to reuse the existing models. Also, I think compression is another one because of course when you’re thinking about the computation, one side of it’s for training, the other side is for inference and hosting. So if you are hosting a bigger model, of course you need more CPU GPU power and it makes more carbon footprint. Right? So how we can also compress these models. And that’s why another research area that we started to dig into with one of our companies again, used to be company Chorus, is around compression. It’s about distillation. And of course, one reason is environment and reduction of carbon footprint. The other one is cost, because it’s going to translate directly to your AWS cost. Right? 
Jon Krohn: 00:45:20
Exactly. Yeah. 
Parinaz Sobhani: 00:45:21
Yeah. And for startups, it was also about the cost reduction. So it’s just a win-win for me. I feel better about using these technologies because we are thinking about the reduction piece. Also I’m also reducing the cost for our companies. Especially the hosting cost. 
Jon Krohn: 00:45:40
Yep. It’s one of those situations where doing what’s good for the environment is aligned with what’s good for the company. And so, it’s not hard to get buy-in. 
Parinaz Sobhani: 00:45:49
Yeah, that’s right. That’s right. Yeah. 
Jon Krohn: 00:45:55
So something that comes to mind for me there is, when we talk about BERT and smaller models that might be able to do a lot of what BERT can do, there are these smaller models. You mentioned the word distilled, and that ties in. There is a model called Distilled BERT, which is supposed to have a much smaller compute and carbon footprint and cost footprint, but you can get a lot of the same results. So that’s something that you could potentially look into. And you mentioned at inference time. And so, something that’s interesting here … I don’t know if you were thinking about this, but we can talk about it explicitly, is something that you can do to reduce your cost and carbon footprint at inference time is quantizing your model weights so that instead of using really precise float precision for your model weights, you can quantize the model to less precise precision. And that allows your models to run more quickly in production, you save costs, and trees sprout up all around the world in thanks. 
Parinaz Sobhani: 00:47:02
Yeah. We actually tried quantization as well, and we ended up combining quantization and distillation for that problem, which was speech transcription. And it worked very well. And again, I encourage you to read our paper because we also publish. So we published a paper and you can find it in web, or I can share the link with you to share it with the podcast. But definitely you can find the paper and learn more about how we combined quantization and distillation to really not … At the end of the day, we didn’t want to compromise the performance or utility, but at the same time we wanted to make it more cost efficient. 
Jon Krohn: 00:47:44
Awesome. So Parinaz, you have been talking about such cool problems that you’re tackling. They are truly at the cutting edge of AI. And you get to do this in a way that is making a big impact, typically for multiple portfolio companies, these fast growing tech startups that are having a big impact on the industries that they’re in. Sounds like a pretty amazing job. So first of all, I noticed that you are doing some hiring. So it looks like you’re basically always hiring top data scientists and machine learning engineers. So I don’t know if you want to talk about those roles in particular, but what do you look for in people that you hire? 
Parinaz Sobhani: 00:48:30
Very good question. So first of all, raw intelligence is definitely important for us, and we are really lucky to have a very smart team. But at the same time, what drives me personally, and I guess most of the people in our team also, they are really motivated by impact. So we never ask, it’s a problem. We’re not going to work on this problem because it doesn’t use a cool technology. It doesn’t use BERT or it doesn’t … So it’s always about really … First of all, I guess that’s the motivation piece. What’s really drives you? And is it really solving the problem, or is the technology? I guess, we are normally looking for people who are more passionate about solving the problem, and machine learning and deep learning and all these cutting edge technologies are different ways you can solve the problem. But even if you can solve it rule based, you have to be honest and you have to be transparent and say maybe let’s try the rule based techniques. Or maybe let’s start with lowest aggression, and the next versions of the model we can move on to the more advanced techniques. So that’s the first one. Normally we are looking for pragmatic, practical, and- 
Jon Krohn: 00:49:37
That’s exactly the word I was going to use. Yeah. 
Parinaz Sobhani: 00:49:42
Yeah, exactly. That’s maybe the way we … We have a very pragmatic and practical team because we are working with the startups. Maybe if you want to work for a big corporations, you have the luxury of be more passionate about technology other than problems. But it’s not the case for us. We are working with startups and we have to be very pragmatic and practical. So passion for research and technology. But at the same time, be very practical and pragmatic. A quick learner because we are moving from one problem to another problem, from one market to another market. You have to quickly learn and ask the right question. I think that’s another characteristics that I’m looking for normally when we are thinking about hiring, because you can give … 
Parinaz Sobhani: 00:50:26
There are two types of people. Maybe the first one is they are given the instructions and they’re given the problem and they can perfectly execute on that. There are other types of people that I guess they are also asking, why should I solve this problem? Why does it matter? How does it really help us to get to our goal? How does it improve the end customers or user’s experience? And I think that I also like to hire people who are curious about the why. Because if I make that decision of why, and this is a problem we have to work on, I might make mistakes. So I really always like to be surrounded by people that always ask this why question and challenge me, and really correct me if I made the wrong assumptions. So that’s another one. 
Parinaz Sobhani: 00:51:14
Engineering and coding skills is definitely important because at the end of the day, we have to deliver the code. And we have to deliver good quality code. And again, maybe you are working in different environment. You can work with multiple engineers and hand it off to engineers, but normally we prefer full of slack. Someone who can do the research and machine learning, as well as delivering good quality code. And then the last one, but not least, is the communication skills. So it’s really about be effective in your communication. Be able to influence others because we often work with other data scientists and they might have their strong opinions about why we should solve it in a certain way. So it’s just really about really getting comfortable challenging others, listening very well, understanding where they are coming from and what’s their arguments. And if you believe they are right or wrong, just be able to effectively get your message cross. And of course, promote your work or deliver, present what you have achieved. 
Jon Krohn: 00:52:26
Yeah. I love the points that you’ve made here. I collected five big things that you look for in people you hire. That they’re motivated by impact, meaning that they’re pragmatic. So they’re more about getting an impactful solution than necessarily just for its own sake using the most state-of-the-art technique, which makes perfect sense to me. The second thing was being a quick learner because you’re constantly moving between different industries and problems, which of course, in this position like Georgian where you have all these different portfolio companies, that’s inevitable. I love the third one. That is my favorite. That they’re comfortable challenging, with asking why. Including of leadership. So asking you, well, why are we doing it this way? I love that. I actually had … Almost a decade ago now, I was in job interviews with a company where they were like, let’s say you’re in a scenario where … So the person that would be my hiring manager says, let’s say, you’re in a scenario where I ask you to do something and you don’t think it’s the right way to do it. What do you do? And I’m like, well, of course I try to come up with a way of explaining why I think there’s another way to do it. 
Jon Krohn: 00:53:35
And he was like, no, no, no, no. Let’s say the CEO comes to you and says, you must do it this way. What do you do? And I’m like, well, the same thing. I’m going to try to explain why I think there’s a better way. And he’s like, nope. Sometimes you just have to do it and you can’t understand why. And I mean, for me, that’s a big red flag. That’s not somebody I want to be working with. So I love that. It’s the opposite culture at Georgian. The full stack piece makes perfect sense to me. Especially when you have people that are working on these … I can imagine there’s scenarios where people on your team, you might have one person dedicated to one particular problem with a particular portfolio company for a few weeks, and so that person needs to be able to go all the way from model development all the way through to deployment. 
Jon Krohn: 00:54:24
And then, your final point was communication, which I also … So that is the most common skill that I hear when I ask this question of guests on the show. But you said something that is so obvious in that, that I haven’t heard with a previous guest, which is listening. So when people bring this up, so when communication comes up, it’s the number one by far trait that people look for in people they’re hiring. But they always talk about verbal communication and written communication, which you did, but you’re the first person to mention listening, which of course is critical in communication. So thank you for that. 
Parinaz Sobhani: 00:55:02
I thought about my … I have a analytic personality. But I always think about why I failed in my communication, why I failed in influencing someone. And number one reason is I didn’t listen. I really didn’t understand where they are coming from. And I keep pushing and I keep iterating and reiterating on the similar points that they had, really without understanding what they are trying to say. So that’s why I was like, yeah, that’s the main reason of my failure. So I have to make sure the next people that we are hiring, they are better than me in at least listening.
Jon Krohn: 00:55:45
Oh, man. All right. So that’s really cool. I definitely encourage people to look out for those jobs on the Georgian website. Sounds like an amazing place to work. Seriously, I am jealous of that scenario. Sounds like an amazing leader in you, Parinaz, and the problems you’re tackling. Very cool. So how did you decide to become involved in computer science and machine learning in the first place? You probably wouldn’t use this kind of language yourself because you seem like a humble person, but I can say it. Is that you are a leader. You are a huge leader in data science now in this role that you have as the head of machine learning and applied research at the biggest private fund in Canada. This is massive. The impact that you have is huge. How did you find your way to where you are now? What’s your motivation? What’s your why behind what you’re doing?
Parinaz Sobhani: 00:56:39
I think that’s how … First of all, that’s how my brain is wired. I love solving problems and I was lucky enough to learn about coding in my high school. And I found, oh my God, that’s so powerful. Give me code. Programming and coding and softwares are one of the best ways to solve of the problem in the current world. Technology in general. And that’s why I ended up studying computer science and computer engineering. But at the same time, I’m a lazy person. And in third year of my Bachelor, actually, I learned about machine learning and I was like, oh my God, that’s so perfect for me because it can automate me. So then I don’t need … Because coding is all about step-by-step, giving the instruction, solving the problem. 
Parinaz Sobhani: 00:57:34
With machine learning, it’s on the opposite side of this spectrum. It’s all about, give it … It’s all about mathematical optimization. By someone giving it training data and input data and outcome data, and let the model figure out how to solve the problem using the optimization. And I was like, that’s really, really awesome. And I started to play with different problems in my Master, and then in my PhD I moved to Canada and I was thinking about, what is the hardest problem for AI to tackle? And at that time I thought natural language understanding is the most difficult problem. And the reason is natural languages evolved over time, and they evolved in a way to capture the complexity of the human race. That’s why human languages are so complex right now. And I thought that’s going to be the most difficult problem for AI to tackle because it’s basically understanding the complexity of the human brain as well. 
Parinaz Sobhani: 00:58:37
That’s why, at that time, I started to look into different types of problems. The state-of-the-art, or the one of the most common problems in [inaudible 00:58:47] was sentiment analysis. What we call as a classification to positive or negative or neutral. But I thought, it’s not as meaningful for me because what might be even more impactful is one step further, what we call stance classification, which is mostly about, you have a piece of text, you have a target of interest, and you really want to know the overall position of the author of piece of text towards the target of interest. And the target of interest can be a concrete entity, like a person or a product, but it can be also an abstractive entity, like legalization of abortion. And even I use legalization of abortion rather than anti-abortion, and then it’s going to completely change the problem. So even the way I framed my target of interest going to completely change the problem. So I found it very interesting problem. And money may be one of the closest classification problems towards natural language understanding. 
Jon Krohn: 00:59:57
I’m going to try to say back to you what you’ve told me to make sure that I’m on track. So this sounds like it’s now … We’re getting into your PhD research, right? 
Parinaz Sobhani: 01:00:09
Yeah. 
Jon Krohn: 01:00:10
So you studied computer science in Iran. You did a Bachelor’s and Masters there in computer science. You realize during those degrees that you have this opportunity to be using machine learning to be figuring out how your computer program should run, without you needing to have if/else statements for everything, which actually is a quick side note. There’s a brilliant article, a brilliant blog post by Andre Carpathi called Software 2.0, which builds a lot on that idea that you’re describing there, Parinaz. I’m not telling you that, because you’re probably aware of it, but I’m letting the audience know. So it’s this idea of how software 1.0, we still use that as data scientists and machine learning engineers. So we still use Python most commonly as the language, as the way of how having our code run. But then on top of that software 1.0, we have this software 2.0 of these model weights that learn and we don’t have to explicitly program how information should flow through our program. And that article does a great job, this software 2.0 article by Andre Carpathi does a great job of summarizing all the advantages of doing it that way, of which laziness is one for sure. That we could be lazier and we don’t have to write as much code. 
Jon Krohn: 01:01:33
But so, you become fascinated by machine learning. You come to Canada at the University of Ottawa and do a PhD in artificial intelligence and natural language processing, and your particular research … You’re particularly motivated by this very hard problem of natural language understanding. So how can we have machines that are able to replicate this extremely nuanced, complex language capacity that human brains have? And so, then you started describing a specific problem. So you were talking about how there are some natural language classification problems that are … And I agree with you. Relatively unsatisfying. So you talked about sentiment analysis. So you could have a movie review. And so you pass the natural language of the movie review into your model and the model predicts, is this a positive movie review? Did the person like it? Did they not like it? Or are they neutral? So you have these three buckets. But then, you were talking about how your research is a lot closer to real natural language understanding. And so, you’re talking about specifically it relating to a particular identity. Flesh it out a bit more for me there. Maybe an example so that I can grasp exactly what you were studying. 
Parinaz Sobhani: 01:03:03
Yeah. So imagine I say … Again, maybe the example would be the legalization of abortion versus anti-abortion. So the target of interest, the way you frame your target of interest, you end up completely … So if I used as target of interest legalization of abortion, and then I have a piece of tweet that says, “Women should not be allowed to choose because it’s a gift from God.” Then if your target is legalization of abortion, you are against it. But if your target is anti-abortion, you are in favor of it. So it’s not as easy as positive-negative, because positive-negative doesn’t make sense here anymore. It’s all about your position towards this target, and what’s the target? And what’s the relationship between this target and your opinion? 
Jon Krohn: 01:04:03
I see. Yeah, that is definitely nuanced. It sounds like a really tricky problem and a lot closer to our real understanding of natural language. That’s really cool. And so, it looks from a … I’ve looked at the titles of some of your papers from your academic time, and it sounds like you were working on these kinds of natural language models, like LSTMs, long short term memory units, and applying those to these advanced natural language understanding problems. 
Parinaz Sobhani: 01:04:34
Yeah, definitely. At the time LSTMs were state-of-the-art because there was no BERT. 
Jon Krohn: 01:04:39
Yeah. 
Parinaz Sobhani: 01:04:40
When I did my PhD. And I also worked for a national research concern of Canada in [inaudible 01:04:48] lab, and then we also start thinking about what’s going to be, given again, the complexity of the languages. And especially Noam Chomsky on the other side of spectrum, always believes the languages, they have a structure. Grammatical structure, semantics, and syntactic structure, and how we can really leverage these structures to better understand the meaning and better … At the end of the reframe the problem as a classification problem and how we can use … And then we started to think about how we can use the grammatical structure, because most of the LSTMs at that time were left to right or right to left sequential models. None of them were tree-based or DAG-based. So we started to explore more different types of classification problems, how we can leverage the structure of the language. And that’s how we ended up publishing our papers. I was lucky to work with very good researchers and we could publish in best venues. 
Jon Krohn: 01:05:46
Very cool. 
Parinaz Sobhani: 01:05:47
And then I used similar techniques for machine translation. I joined Microsoft research and I worked with the researchers there to really figure out, again, at that time, the state-of-the-art was sequence to sequence models, two RNN models, and… But it was left to right, right to left, again to leverage the structure of the language. So we started to think about how we can use similar data structures, tree structures, RNNs and LSTMs for machine translation. Especially, for example, from Japanese to English, or some languages that the structure might be even more impactful to the meaning. 
Jon Krohn: 01:06:22
That makes a lot of sense to me, and it does seem to me like there’s a huge opportunity there. So the vast majority of these natural language models, like we’re talking about recurrent neural networks, long short term memory networks, even transfer models like BERT. They work in a sequential pattern, a one-dimensional way. They go from left to right. And some of them are bidirectional. They’ll go both ways. They’ll go left to right and right to left at the same time. So you mentioned a tree structure. So there you have, I think, branches. So it’s a non-sequential structure. And then you also mentioned DAGs, directed acyclic graphs, which can have even more … They can be even more complex than trees. They can have all kinds of direction and branching and … Anything but a loop. So I can see how- 
Parinaz Sobhani: 01:07:16
Yeah. So maybe one point. I think, actually, for BERT, we solved the problem. That’s why we don’t need tree structure or DAG structure, because now we are randomly masking some of the tokens. So it’s not like … Because before, when we were dealing with RNNs, we were protecting the next word based off the previous words, whether you are traversing it right to left or left to right. But it’s not the case with transformers anymore. We are just randomly masking some of these tokens, and we are trying to predict that mask tokens based on all tokens around it. Left or right doesn’t matter. So that’s why I believe we don’t see that tree structure or DAG structure transformers anymore because it’s… We already solved the problem in a different way. 
Jon Krohn: 01:07:59
Cool. Yeah. You know this stuff way better than me so I’m glad that you said that. And the only word there I think that we should clarify for the audience is token. And so, basically a token is, it’s a part of speech. It could be a word or it could be a character. So with your natural language model, at some point when you create your architecture you decide, okay, I’m going to break up all of my natural language into words, or maybe sometimes pairs of words so that something like New York is treated as a single token as opposed to two tokens, two separate words. But yeah, I love that you just taught me something. So I greatly appreciate that. 
Jon Krohn: 01:08:38
So we’ve learned a ton from you about the work that you do at Georgian, which is obviously extremely exciting. We’ve talked about how you got into what you’re doing. There’s one last topic that we touched on a little that I want to get into a bit more. So we talked about the ethics of AI with respect to the environment. But there’s another topic that I know is of interest to you, which is another AI ethics topic, fairness. So you’ve written a blog post about this, which we’re going to share in the show notes. But maybe you want to just talk a little bit about that. 
Parinaz Sobhani: 01:09:17
Definitely. I think one of my core values is fairness, and it’s mainly because maybe I grow up back in Iran in a very … Because of all the problems. We have women rights and suppression mainly. And I always keep thinking about how we can build a fair world and how we can have some contribution to make this world a little bit more fair, which is very hard problem because it’s fundamentally unfair. But I started to work on AI and machine learning. And 10 years ago, if you would have asked me, what do you think? I would say these models are fair because it’s all about mathematical optimization. It’s completely objective. And maths is objective. And I would really be so passionate about look, no, actually we have to automate all these humans because humans are sexist. Humans are the racist. But these models, these machines, they can’t be sexist or racist. And I think it was a mistake. Maybe I didn’t know enough, but now we all know. We all know these models can be also sexist and racist because the main ingredients we are using data, and where is data coming from? Data is coming from human behaviors or human labels or human annotations. 
Parinaz Sobhani: 01:10:40
So it’s almost impossible to get any completely objective source of truths or data at scale. Especially at scale. Maybe you can get 10, maybe you can get 20. But we know, especially for these large models, you need millions of billions of samples and training data. So definitely, when the main ingredients are sexist and racist, your mathematical optimization, no matter what it is, depending on your objective, it can be also end up to be a racist or sexist model. And I’m so passionate because, still, I believe we might be able to move the needle. I’m not as optimist anymore, but still … I mean, it’s encouraging for me all these research going on in … And of course, I’m not … Given that I have more applied focus right now, I’m not actively contributing to this research field. But at the same time, I always try to read the new papers and stay on top of that because still, how we can really use technology, how we can even debugging these models to really understand the root cause of these biases. 
Parinaz Sobhani: 01:11:59
And sometimes, it’s easier to solve the problem or remediate or have some remediation plan. But sometimes it’s harder. It might not be as straight forward model. But even transparently communicating the bias in your model and the blind spots of your models, it might be easier than dealing with humans. Because humans, they have … It’s very hard. Dealing with humans is the most difficult problem in the world for me. 
Jon Krohn: 01:12:29
Hell is other people, I think is the saying. 
Parinaz Sobhani: 01:12:37
Yeah. I should add, for me. For me, technology and machines and software is definitely easier to deal with other than humans. So I still encourage. I’m so hopeful that if we do more research, if we have more awareness and have more demand from the end customers, from the end users of these models to understand exactly how they work to some extent, what they use as input, what are the potential. And of course, legal and compliance going to be another aspect of it that, how governments can also help to have some sort of audits for these models. So much going on. And it’s not an easy problem. I always say, please don’t over simplify the fairness problem, because I have seen it in other organization that they simply just remove the age and gender and maybe location from their data and they say, “Problem solved.” Now we don’t have access to features or attributes. So our models as I know, I was like, let me give you an example. If I ask you which social media do you use the most? You might feel, okay, doesn’t really reveal my gender or ethnicity. But I can tell you, if you say Pinterest … If your answer is Pinterest, you are more than 80% a white woman. 
Jon Krohn: 01:14:09
Right. 
Parinaz Sobhani: 01:14:11
Right? So it’s not as simple as removing all these sensitive features from your data, because there are so many other attributes that they have a strong correlation with those sensitive attributes. 
Jon Krohn: 01:14:26
Yeah. And it can even be encoded in the way the language is used. So even if you were able to remove all these demographic factors, you say, okay, we’re not going to have Pinterest, because that’s too … We’ve got to get rid of that word because it’s too related to white women. You could still have … Just the way that the language is used can vary by gender or race. And so, intimately tied up in natural language are these implicit indications of sociodemographic group. And so, I agree with you 100%. These are not easy problems to solve, but they are important problems to solve. And so, two resources to point listeners in the direction of … So first of all, Parinaz’s blog post, which is on the Georgian website. It’s called Seven principles of Building Fair Machine Learning Systems. We’ll have a link to that in the show notes. And then we also … We did an episode on AI ethics, if you’re interested in hearing more about that. So episode 449 with Ayodele Odubela is focused entirely on that topic, if you want to learn more about it.
Jon Krohn: 01:15:37
So Parinaz, this has been an extremely enjoyable experience. I’ve learned so much from you and had so much fun, but I want to start winding down the episode a little bit and be mindful of your time. And so, I always ask guests on the show before I let them go. Do you have a book recommendation for us? 
Parinaz Sobhani: 01:15:58
Definitely. I’m a feminist and now I live in Canada, and Canada is my second home and I’m a proud Canadian. So my favorite feminist, great female author is Margaret Atwood and Alice Munro. I love them. I love their books. I wasn’t a big fan of shorter stories, but Alice Munro completely changed my perception of shorter stories.
Jon Krohn: 01:16:26
Wow. 
Parinaz Sobhani: 01:16:26
And since this pandemic I started to read more and more of her books. And honestly, that was the highlight of my pandemic. I couldn’t wait for quiet time to really read more or listen to her audio books. Definitely these are the two favorite authors. And I so love novels. I love stories. 
Jon Krohn: 01:16:52
Nice. So as a Canadian, I strongly support your choices here. We had to read a lot of Margaret Atwood in high school and … Yeah. Some deeply influential books, some of which have been made into very popular series. Handmaid’s Tale, for example. And I love that Alice Munro recommendation. I’m often looking for a short read, because sometimes I’m not in the mood to get deep into a novel that I know is going to take me, at how slowly I read, it’s going to take me at least weeks to get through. I’m like, it would be nice to have something that I could just draw a line under in a couple of days. And so, yeah. Short stories. I don’t know how I hadn’t thought of that. Alice Munro. Do you have a particular collection of short stories by her, or- 
Parinaz Sobhani: 01:17:37
All her books are excellent. And she got, I guess, Nobel Prize for one of her books, but I can’t remember exactly which of the books. But maybe you can start with that one. 
Jon Krohn: 01:17:45
Nice. Yeah. I’ll look that up. It sounds like I can’t lose. All right. So, Parinaz, you have enlightened us with so much in this episode. How can people follow you to get more? 
Parinaz Sobhani: 01:17:59
I think the best way to reach out or follow me is by Twitter, through Twitter and my Twitter account is @ParriAIML. 
Jon Krohn: 01:18:12
Nice. We will have that in the show notes for sure, Parinaz. Thank you and, yeah. Enjoy the rest of your day. We’ve got a nice day here in Southwestern Ontario, don’t we? I hope you get to spend some of it outside in the sunshine. 
Parinaz Sobhani: 01:18:27
Thank you. Yeah. The same. Enjoy the last days of summer. 
Jon Krohn: 01:18:36
What an amazingly intelligent and thoughtful person Parinaz is. I learned so much from her in this episode, including on how Georgian accelerates the impact of the tech startups in their investment portfolio by providing free collaborative resources for scoping out and delivering data science projects. We talked about how transfer learning can enable powerful models to be trained while limiting financial and environmental impact, with a special shout out to Hugging Face for making transfer learning with massive state-of-the-art transformer based models of natural language like BERT relatively straightforward. We talked about the attributes Parinaz looks for in the data scientists and machine learning engineers she hires, specifically pragmatism, communication, quick learning, being comfortable challenging leadership, and being full stack from model creation through to deployment. And we talked about how math is objective, but the large-scale training data we use to train models, including natural language models, is subjective and contains biases that are not simple to mitigate. 
Jon Krohn: 01:19:44
As always, you can get all the show notes, including the transcript for this episode, the video recording, any materials mentioned on the show, the URLs for Parinaz’s Twitter profile as well as my own social media profiles, at SuperDataScience.com/ 509. That’s SuperDataScience.com/509. If you enjoyed this episode, I’d greatly appreciate it if you left a review on your favorite podcasting app or on the Super Data Science YouTube channel. I also encourage you to let me know your thoughts on this episode directly by adding me on LinkedIn or Twitter, and then tagging me in a post about it. 
Jon Krohn: 01:20:21
To support the SuperDataScience company that kindly funds the management, editing, and production of this podcast cast without any annoying third-party ads, you could consider creating a free login to their learning platform at SuperDataScience.com, or consider buying a usually pretty darn cheap Udemy course published by Ligency, a SuperDataScience affiliate, such as my own, Mathematical Foundations of Machine Learning course. 
Jon Krohn: 01:20:44
All right. All right. Thanks to Ivana, Jaime, Mario, and JP on the SuperDataScience team for managing and producing another awesome episode for us today. Keep on rocking it out there, folks. And I’m looking forward to enjoying another round of the SuperDataScience podcast with you very soon. 
Show All

Share on

Related Podcasts