Kirill Eremenko: This is episode number 237 with Principal Attorney, Jessica Merlet.
Kirill Eremenko: Welcome to the SuperDataScience Podcast. My name is Kirill Eremenko, Data Science Coach and Lifestyle Entrepreneur. And each week, we bring you inspiring people and ideas to help you build your successful career in data science. Thanks for being here today, and now let’s make the complex simple.
Kirill Eremenko: Welcome back to the SuperDataScience Podcast ladies and gentlemen. Super excited to have you back here on the show because today I have a very special guest joining us for this episode. A dear friend of mine, Jessica Merlet who is the principal attorney of the law office of Jessica Merlet and its European counterpart, Merlet Legal Consulting. What you need to know right off the bat about Jessica is that she’s an extremely experienced lawyer. She is licensed in three states in the US, in Illinois, Georgia and Washington DC. And now, she resides in the Netherlands where she offers GDPR legal services as well as other compliance services to her European clients. As you can already sense, we have a very exciting guest on the show today. And what we talked about is all about GDPR, the General Data Protection Regulation, which was introduced in Europe in May 2018. And before you say anything, you do need to hear these insights.
Kirill Eremenko: Why is that? Well, whether you are a business owner, a director and executive of an enterprise or a data analyst or data scientist analyzing data, you do need to know all about GDPR. And moreover, whether you’re in Europe or not in Europe, a lot of these things will still apply to your company and you do need to know them. More details are available on the podcast. So make sure to tune in and soak in all this amazing knowledge that Jessica shared with us today. In fact, today you’ll get such a comprehensive overview of GDPR that you can call this podcast GDPR, A to Z, everything you need to know about this new legislation. We will cover off the whole life cycle of data within a company. What are the reasons, why and how and when you can capture data? What are the requirements for that? How you have to store data in an organization and how long you can store it for and how you can and cannot analyze data about your customers.
Kirill Eremenko: Plus in addition to all of that, you will learn terms such as data controller, data processor, what sensitive information is, what affirmative consent is. The four pillars of GDPR, six legal basis for capturing data and much, much more. In addition to all of that, Jessica shared several case studies which will help you understand GDPR even better. And the cherry on top of the ice cream is that we’ve prepared a special cheat sheet for you to download and keep for you to follow along with this podcast, if you are in front of the computer or for you to later on revise so that you can soak all this knowledge even better in. So you can find this cheat sheet at www.www.superdatascience.com/237. That’s the show notes where we usually post everything about our episodes. Once again, that’s www.superdatascience.com/237.
Kirill Eremenko: There, you can download this cheat sheet and keep it. And that way you will always be able to reference all these things that you will hear on this podcast. And without further ado, I bring to you principal attorney Jessica Merlet.
Kirill Eremenko: Welcome back to the SuperDataScience Podcasts ladies and gentlemen. Super excited to have you back here on the show. And today, I’ve got a very special guest, a great friend of mine and a very experienced lawyer and principal attorney Jessica Merlet. Jess, welcome to the show. How are you today?
Jessica Merlet: I am great Kirill. Thank you. How are you?
Kirill Eremenko: I’m doing very, very well. Thank you very much. It’s very nice and hot here in Australia, how’s it going in the Netherlands?
Jessica Merlet: Well, I’m sitting here looking outside of my window at all of these birds that are playing. Everyone is waiting, and I think eating their seeds because we’re supposed to have quite a lot of snow, which is a rarity for the Netherlands. But we’re supposed to get quite a lot of it later today. So I’m kind of excited for that actually.
Kirill Eremenko: That’s awesome. It’s really cool, I love how we’ve been chatting about your house moving. So for our listeners, Jessica is moving house soon. And still can’t get my head around how she’s managing to move her most valued possession, the jacuzzi. Tell us a bit about that, your outdoors spa?
Jessica Merlet: Well, my outdoor spa, I guess. I had originally thought that I was going to sell it for this house move simply because it is such a difficult thing. But I think I can hire a few strong men from the local university to figure out how to move it to the new house. Luckily, it’s not that far away, maybe 5 or 10 kilometers. So I think it will make it, I’m not sure how to get it hooked up to all the electric once it gets there. But hopefully, it will be okay and hopefully all the snow will be melted by the time that all of that is happening as well.
Kirill Eremenko: Nice, nice. That’s really cool. Well, Jess, very excited to have you on the show. It’s been ages since we met, and I think over the years we’ve really bonded through the different discussions we’ve been having about legal stuff and ideas. And I really appreciate. It was your idea to arrange this podcast, tell us why, where did this idea come from to come on the show and share the legal aspects of data and analytics?
Jessica Merlet: Sure. So recently in Europe, the whole data landscape has undergone a change. And I think many of your students and many of the people that listen to your podcast are probably familiar with that. I think it would be difficult to not be familiar with that. In fact, because there have been quite a lot of news articles and publications that have been coming out over the last year about how data and privacy has changed in Europe. And a few weeks ago, I actually did a course at Maastricht University to be a certified data protection officer. Data protection officer is a position, an independent position with a company that is in charge of really being a liaison, being a coordinator with the European data protection authorities and to assist the company with analyzing and ensuring compliance with these new laws.
Jessica Merlet: And I did the course, and in the course we talked really quite a lot about data analytics and how data processing, how doing data analytics, how AI, how all of that is impacted by the new laws. And in fact, how all of that also impacted the new laws and made the changes and drove some of the changes that have come about. And I thought really for your podcast listeners it would be something different, but also something very informative, and something that they’re not necessarily thinking about all of the time really. What are the laws and what has changed in the last year as to how analytics can be done?
Kirill Eremenko: Fantastic. Fantastic. Super excited about that. It is indeed a very relevant topic in this day and age. And everybody should be thinking about it if they’re not already thinking about. Not just from company perspective and business leaders, that’s for sure. If you’re a business leader or owner or an executive listening to this, then this is definitely something you have to be adept at and know about these things. And even if you’re just a practitioner inside of business and you might think this doesn’t apply to me, no, it actually does. There’s a lot of things that you need to know about the legalities of using data and different types of analytics on that data that are coming into place. And so Jess, when you mentioned like these new regulations, are you talking about the GDPR, the General Data Protection Regulation in Europe?
Jessica Merlet: Yes. So that came into effect in May of 2018, and it’s something that all companies really are still trying to come into compliance with. I would say there’s no company in the world that is actually fully compliant or even a good part of the way there. As we just saw, in fact last week, Google was fined 50 million euros for not being in compliance. So it’s something that companies are working on. And so there’s the GDPR, there’s also a few other laws that impact data protection and data privacy. For example, we’re currently reviewing a new E-privacy directive, which talks about things like additional data protection rules for telecommunication networks or internet service providers. It talks about metadata, and talks about really confidentiality of communication.
Jessica Merlet: So that’s another one that’s out there. It’s currently under review. And I think a big takeaway from this is that the landscape really is constantly changing. Not only do we have GDPR in Europe, but also of course we have the California privacy and data protection rules that also recently have been modified. So it’s not just Europe, but the focus for today is Europe because that’s really sort of our leading example in the world right now.
Kirill Eremenko: That’s great. And just for those of you listening who are about to tune out because you’re not in Europe but in US, I specifically asked Jess at the beginning, just before the podcast, how this affects companies outside of Europe. Jess, could you please repeat that whole this description you mentioned about the three types of businesses and how they affected by the GDPR?
Jessica Merlet: Sure. A lot of people say or a lot of my clients as well come to me and they say, Jessica, this is not relevant to me, it’s not relevant for me. I’m not based in Europe. But the GDPR doesn’t care where your company is based, it doesn’t care even if you’re a company. You might be an individual that has a smaller startup that’s located in Mongolia. Well, you still have to comply with the GDPR if you meet some of the requirements. So of course, companies that are based in Europe have to comply with GDPR. What we’re looking at is, are you processing European customers’ data or are you collecting European customers’ data? So if you are based in Europe, naturally by default you’re probably doing some collection of European data. Also, if you’re a public entity that’s based in Europe, then you have to comply with GDPR as well.
Jessica Merlet: But also I think more importantly for international companies that are asking themselves, do we have to comply, and really that don’t necessarily want to comply. What the GDPR asks is if you’re doing any sort of large scale data collection of European customers. Actually, I went to back up. I’m going to probably use the word customer, but it doesn’t only mean in this sense, it doesn’t only apply to customers. The GDPR also applies to if you have a workforce, for example, if you have independent contractors, if you’re using vendors in Europe. So it’s not just customers that we’re concerned with, we’re concerned with, are you collecting or are you processing data of any individual that’s located in Europe? So not just customer. If I screw that up and say customer, customer, customer during this, always keep in mind please that it’s also your staff, it’s also your vendors, whomever.
Jessica Merlet: So we want to ask ourselves, is my company, is it doing large scale processing of European data? And the GDPR doesn’t really well define that. I have actually asked whether there are any sort of metrics that we can look at to say how many data subjects, data subject is the word that the GDPR uses to sort of encompass customer and an independent contractor. So is there a certain number of data subjects that we have to be processing their data before we have to comply with GDPR or volume of data or duration of processing or the geographical extent. And the GDPR takes all of those things into consideration, but it doesn’t really give us any sort of strict numbers on that. So we have a few examples that could I think be helpful. For example, if a business does process data subjects, data and its regular course of business or if it sells products through eCommerce regularly to European customers, if it regularly collects and does data analytics on European data subjects, if it processes real time geolocation information of data subjects who are located in a fast food chain.
Jessica Merlet: In fact, I had been very specific example here that says if we’re processing realtime geo location data of customers of an international fast food chain for statistical purposes by a processor who might be specialized in that type of data analytics. Even if that company is located in America, for example, that company would still have to comply with the GDPR. If you are an insurance, if you’re working for an insurance company for example or a bank that’s an international institution, that would have to comply with the GDPR if it has European customers. For service providers, internet service providers, all those types of companies do have to comply. And really what the GDPR asks is, are you doing regular and systematic monitoring of any European data subjects? So are you tracking them, are you profiling them? Are you selling to them anything like that?
Kirill Eremenko: Very interesting. Thank you for the outline. I think that third point, and the breadth to which it covers businesses is extremely important. It pretty much colors almost any business that has kind of any sort of connection with European customers or even not direct but indirect connection. So now that we know who GDPR applies to, tell us a bit about what is GDPR? Why did it come about? I heard that before GDPR, before May 2018, the European privacy laws for internet use hadn’t been changed for about 20 years. So what’s triggered this new legislation?
Jessica Merlet: Yeah, that’s correct. So really what triggered it is that it was still based on old methods of data collection, old methods of data processing. If you think how much technology has changed in the last 20 years. So that was really the driving force behind the GDPR. And although it took 20 years to really come into effect, of course, that’s simply because the legislation is quite slow. It has been in the works for quite a long time. There’s already new laws that are in the works as well, updates and different directives. There’s some that had been signed. For example, we have a a council of Europe directive that also deals with data privacy and data protection. It’s been signed by 47 of the member states, but it’s not yet at the level that the GDPR is.
Jessica Merlet: Really the driving force is if you think about how technology has changed in terms of data analytics in the last 20 years, 20 years ago. We necessarily were not able to do such regular and such systematic processing using things like profiling and scoring or location tracking or any of this and process it on such a large scale at such a quick pace. So that’s something that has really changed, and that’s one of the main concerns of the European commission in enacting the GDPR.
Kirill Eremenko: Okay, got you. Now we know who it applies to, where it came from. GDPR is such a broad topic, how long was the course that you took?
Jessica Merlet: It was an entire week. And that’s just for the data protection officer certification. There are many more levels in fact after that that can be done. Maastricht University for example is sort of the leading institute for GDPR compliance training in Europe right now. And it is putting into place even degree programs, master’s programs that focus on GDPR, that focus on data protection and privacy. So it’s ever changing, and it’s really growing to be a new industry at the current moment.
Kirill Eremenko: Okay, got you. So basically what that tells me is that we only have an hour here on this podcast, and we got to cover off the most important things. And the way I think we’re going to structure this is we’re going to approach it through, like a way I like to deal with any kind of company, how companies generally deal with it or the steps that are required to deal with. And there are three main steps. First one is you need to capture the data, they need to store it, and then you need to analyze it to get the insights. And so I suggest we go in that direction, in those three steps and we see what GDPR says about each one of them. What did you say?
Jessica Merlet: Perfect, let’s do it.
Kirill Eremenko: Let’s do it. So let’s start with the capture of data. What does GDPR say specifically about companies being allowed to capture certain data points and not being allowed to capture other data points, and why is that in place? Why can’t a company just go out there and take all the data that is possibly available on its customers, capture all of it? Is there any restrictions there?
Jessica Merlet: All right. There are four main pillars of the GDPR. One of them is that we want to limit the collection of data to its specific purpose so that it’s collected for very specific, explicit, and legitimate purposes. That’s the first pillar, and I’m going to come back to that in a minute. The second pillar really is that we want to minimize the data that we’re collecting. So that means that we only want to-
Kirill Eremenko: Minimize?
Jessica Merlet: Yes, and that’s very important. We want to only collect the adequate data that’s adequate, that’s relevant, and that is limited to what is necessary for the purpose that we identified before. We want to ensure that the data is accurate so that there’s no, what we call toxic data out there, that it’s kept up to date, that it’s only collected where necessary again. And that it’s kept confidential, that it’s protected through security measures and that it’s only stored for certain periods of time.
Jessica Merlet: So those are some of the key concepts that the GDPR is considering. And really, the overarching concept is that we only collect what is needed and for a specific purpose, and that we’re accountable to the data subjects. The data subjects understand what’s going on, they don’t have any surprises. If we think back to Cambridge Analytica, the main takeaway from that is that the data subjects were surprised how their data was being used. And that’s what the GDPR wants to avoid happening that we don’t surprise our customers. We’re only allowed to process data under six very specific legal principles or legal bases.
Jessica Merlet: One of those is if we have consent, so does your data subject check a box that says, yes, you can process my data for X, Y, Z purpose? We’re all familiar with that. If it’s necessary to perform a contract. If you say, for example, I am going to purchase a pair of sneakers from your company, then of course you need the individual’s name and billing address and shipping address to send them those sneakers. So that’s another basis that we can process it on. There is a legal obligation-
Kirill Eremenko: Sorry, but you might not need like the IP address or you might not need the country or the browser that he’s using, whether he’s on a mobile or a desktop device in order to process that legal transaction?
Jessica Merlet: Exactly. Again, that goes back to data minimization. What do we actually need to fulfill the contract? And only what we need, we should not be collecting more data than is necessary. And we know that if you’re doing data analytics, that’s not necessarily what you want to hear because there is more data that [crosstalk 00:23:33]. Exactly, the more data that is out there that you can collect, the better the analytics are going to be, the more information you can find out. So one of the things we really have to look at is the interplay between what is driving the analytics or what is driving doing all of that, and the benefits that can be gotten from that versus the rights of the data subject to privacy and the rights of the data subject to have their data processed in a way that is accountable, in a way that’s transparent.
Kirill Eremenko: I think this also meets the important issue we discussed before the podcast that if somebody captures my data like Facebook or Google or any other company, Microsoft, somebody captures my data, it’s still my data. It doesn’t belong to them. Can you tell us a bit more about that?
Jessica Merlet: Sure. Before I do that, I actually want to mention one more legal basis. I don’t want to go through all of them, but in fact potentially the most important one of those is what we call the legitimate interest legal basis. So you can also collect data if the processing is necessary for the legitimate interest of the data controller. So the big company that is collecting the data, and this is where you can get a broader category where you can maybe get into more analytics if necessary, if desired. But that’s what we went to look at more than anything is, is there a legitimate interest?
Kirill Eremenko: What does that mean, legitimate interest?
Jessica Merlet: There’s actually a three step approach if we want to get very technical that we look at what is the interest of the, what is the purpose of the collection? So perhaps the purpose of the collection is that the company wants to offer a membership, a loyalty card. And it needs to know more about customers’ purchasing habits in order to be able to offer a loyalty card that can give customers discounts. It’s going to allow the data controller, it’s going to allow the company that collects that data to collect more data that may be simply necessary to effect the purpose of having the individual purchase whatever it is the company is selling. Let’s say it’s a grocery store, just purchase the food and leave.
Jessica Merlet: This legitimate interest basis test might give the company, the grocery store chain more of a reason to say, I’m also going to track some other things. I’m going to ask them for their food preference or I’m going to ask them, track them in the store if they have an app open that shows how long they stand in this aisle, for example, through a geolocation tracking to say they really like the green smoothie selection or things like this.
Kirill Eremenko: Or you might ask them for their gender because some products are not relevant to male or female customers.
Jessica Merlet: Exactly, exactly. So on the legitimate interest test then, we have more of a reason to do the analytics, to collect the data. But then of course, we still have to make sure that the purpose is legitimate. It’s not just so that the company can take that data and go do something else. It’s so that it can offer the stated discounts on its loyalty card. Are they actually doing what they’re supposed to be doing what they say they’re doing with the data? Is it necessary to collect all of that data to give the loyalty card and to track the purchasing habits so that the loyalty card gives relevant discounts? And then there’s also the question of balancing. That’s the third thing that GDPR considers, which is does an individual have more interest to not be tracked, to not have the analytics performed than it does for the company to actually be able to offer this loyalty or the discount card? So that’s really the most important one of legal basis that we can consider.
Kirill Eremenko: Okay, got you. So legitimate issues. But couldn’t you just say that pretty much anything is a legitimate interest for the data processor. For example, I might not end up offering, I don’t even intend offering any discount, loyalty card to my customers, but I have a legitimate interest in collecting their geolocation data so I can segment my customers better. And maybe I will find some clusters from there that will allow me to save money on my marketing. Is that a legitimate interest?
Jessica Merlet: No, not at all. Because what necessarily is the benefit for the data subject?
Kirill Eremenko: Ah, I got you. So is there a legitimate interest of the data subject then?
Jessica Merlet: It’s not a legitimate interest of the data subject necessarily. It’s a legitimate interest overall. Our main goal though, and the GDPR’s main concern is really focusing on that data subject. Is the data subject going to be surprised? Are we being transparent with why we’re collecting the data? Are we being accountable to the data subject? If there’s a legitimate interest, it needs to be very specific for a purpose that is not going to invade the data subjects rights to have his or her data collected. It’s not going to be necessary for the controller to achieve their interests. So that’s really what we want to ask ourselves. And legitimate interest does not fit everything.
Kirill Eremenko: Got you, okay, understood. Let’s maybe continue with those six legal basis because you mentioned three, legal consent, if it’s necessary to perform a legal contract, legitimate interest. Just for completeness sake, what are the other three?
Jessica Merlet: Sure. So there’s a legal obligation, which is, for example, compliance with a legal obligation. I would say for your podcast listeners, some of this is potentially not that relevant. There’s public interest. So this is for use by a public entity. If there’s going to be a task that’s carried out by an official authority that’s been vested in the controller, again, that’s something that’s set forth in the law. And also if there is a vital interest. For example, if we have someone who’s unconscious and we need to process their data, maybe we need to look through their wallet and find their, I don’t know, look through their phone and see who their in case of emergency contact is. Something like this, it might be a vital interest. So those potentially not that relevant to your podcast listeners, but those are the full six legal basis for processing under the GDPR.
Kirill Eremenko: Got you. Yeah. I agree that might be not as relevant, but on the other hand, it shows how comprehensive this legislation is. It takes even those situations into account. So far we’ve done the first step of the whole life cycle of data in an organization out of capture, storage and processing. We’ve talked most about capture. In order to capture data, there’s four main pillars. Just to recap and correct me if I make a mistake anyway. In order to capture data, these four main pillars have to be met. It has to be specific, explicit, and legitimate the process of the capture. We need to minimize the amount of data we capture, it has to be adequate and necessary. It has to be accurate so that there’s no false information being captured.
Kirill Eremenko: It has to be also kept confidential with security measures and stored only for certain periods of time. And we’ll get to storage in a second. And we also talked about the six legal bases on the reasons for the capture. You need to have one of these six legal basis or bases in order to capture in the first place. Either legal consent, if it’s necessary to perform a legal contract, legitimate interest, legal obligation, public interest or vital interest. Does that sum it up quite all right, Jessica?
Jessica Merlet: Yes, exactly.
Kirill Eremenko: Awesome. All right, so moving on to storage. You’ve already mentioned that data needs to be stored confidentially with security measures and only for certain periods of time. Let’s elaborate a bit more about on that, what does GDPR in general say about how organizations can store the data of their customers?
Jessica Merlet: Right. In terms of the actual tech aspects of storage, that is not my specialty. That’s where you want to make sure that your data team is working together, both with the lawyer but also with a tech guy or girl. But in terms of how long we can store the data, that’s something that the GDPR is very concerned with because we only are allowed to store the data for as long as is necessary to perform whatever that legal basis was. For example, if you use the legal basis of performing a contract, then we don’t get to store that customer’s data forever. If the customer, for example, chose to not create an account, well, we don’t get to then take all of that customer’s data and do analytics on it because the only purpose of storing that data, of collecting that date, I should say, was to send them that pair of sneakers if we go back to the hypothetical.
Jessica Merlet: So we only are allowed to keep the data and do things with the data for as long as is necessary under that legal basis. And one thing that the GDPR is also concerned with is toxic data. So data that may be out there and data sets that are able to be identified of course, or data that’s out there in a company’s system that is old, that’s outdated, that’s no longer relevant, but that is just sitting around. And every company has this of course. And every company has quite a lot of it. So we want to ensure that really we have processes in place for deleting that data. And we definitely should not be using that data past the time that’s necessary. So that’s one thing that is really a focus of the GDPR, but it’s also something that has to be disclosed to the data subjects in that privacy policy that everyone is familiar with.
Kirill Eremenko: Interesting. So toxic data, are you referring to the notion of the rights to be forgotten, where you can email Google and say, I want these links deleted because there are about me and they’re no longer relevant?
Jessica Merlet: No. But that is an important concept and one that we should probably talk about. Toxic data is more the idea that there’s just this old data that’s floating around, data that’s not necessary, data that’s not being used for its original purpose. We see it a lot of times when data is purchased or the data sets are purchase that there might be some toxic data in that that is no longer accurate or that’s no longer able really to be used. We want to make sure that that’s minimized, that that’s deleted. But in terms of the right to be forgotten, that’s a great thing that you mentioned, because the GDPR has not necessarily some new concepts, but some concepts that are important that all companies need to be familiar with. And one of those is the right to ask that your information be deleted.
Jessica Merlet: You can go to a company, and this is very important for analytics providers as well. If you go to the original data controller, so the company that collected your data and you say, I don’t want you to have my data anymore. That company has to delete your data in so far as it’s not necessary to keep it for a legal requirement. So for taxes, for example. But in so far as it’s not necessary to have it for those kinds of purposes, the company has to delete it. And not only does that controller, so the company that collected it has to delete it, but it has to make sure that all of the data processors, all of the people down the line have to delete it as well. If the company has given it to its analytics company, to its data processor that’s doing analytics or something along those lines, the analytics company also has to be able to go in and has to identify, be able to identify if it can that individual data and delete it from not only its own system but how it’s being used as well.
Kirill Eremenko: Wow. So that is a lot of work I can imagine just finding individual people’s data. That’s insane. I could just go to Facebook and say, you know how hard it is to delete your profile on Facebook? They say it’s deleted, but it’s still actually floating. The last I heard is still floating around and you can always restore it and stuff like that. So theoretically, what you’re saying is I can go to Facebook or email them or write to support or something else and say, guys, I want this data removed completely, all my photos, all my profile, all my comments, everything needs to be gone. And I have the rights to do that. Is that correct?
Jessica Merlet: Absolutely. You do. Whether or not they’re going to comply with that is a different thing. But under the GDPR, Facebook would have to comply. And that’s something that not just Facebook would have to comply with it, but again, everyone with whom it has shared the data. Sometimes if you look at these privacy statements, there are 40, 50, 400, 500 companies that the data controller is sharing a data subject’s data with. It has to be able to track down that data and to tell the data subject, I have given your data to X, Y, Z company. You have to be able to also ask X, Y, Z company to delete your data, and they have to comply as well. So it’s like I said, there’s not necessarily any company that’s actually compliant at this point because it is such a hard thing. It is such a difficult thing to track all of this, but the requirements of the GDPR are that we try to have better processes in place to do so.
Kirill Eremenko: Okay, got you. Understood. All right. In terms of storage, we talked about how long, and it has to only be stored for as long as it’s necessary to perform that specific legal basis for which it was captured in the first place. We talked about toxic data that there must be no toxic data, that the individual, the data subject has to … Well, information about how long the storage is going to happen for has to be as close to the data subject. What about security measures? We’ve been hearing a lot about different hacks, and Yahoo had, I think it was like a billion accounts hacked and lots of other. Insurance providers have been hacked recently. And there’s I think one of the hotel chains was also hacked just a few months ago. So what are the security measures that GDPR requires to be in place when data is stored by the data controller?
Jessica Merlet: Sure. So the security measures are more on the tech side of things, but we do of course want to make sure that your company is compliant with industry standards. Is it doing what it should do based on the sensitivity of the data that’s been held based on how that industry normally stores and manages and secures data? In some companies, that may be having data secured in a warehouse with cameras. In other companies, that may be sufficient to simply have some manager level access and firewall and whatever other tech measures that are there. But part of their requirement is that we really do what’s necessary to the industry standards.
Kirill Eremenko: Got you.
Jessica Merlet: When we can.
Kirill Eremenko: So that’s the place where every company will need to refer to their industry standards?
Jessica Merlet: Yes. And also use a good tech company or a tech individual, this isn’t something that we look at data as just the lawyer coming in and saying, you have to do this, you have to do that. It’s really a collaborative team effort to have various people involved and know what’s going on with the data and know what the legal requirements are. And you will be prepared to handle every situation that may come up. In addition, it’s important to consider that the burden of proof is on the processor. It’s on the company that’s doing the analytics to prove that it has sufficient security measures in place. So that’s not incumbent necessarily on the controller, that is going to come from the processor to show that it is up to industry standards. And it has to actually show this, that it’s up to industry standards, that it has sufficient security measures in place before a single piece of data is transferred to that processor, before a single bit of analytics is done at all on that data.
Jessica Merlet: The processor has to have those security measures in place. And one thing that the data processors really need to think about is that every processing company has to have a processing agreement under article 28 of the GDPR. And those are going to set out the liabilities, the relationship between the processor and the controller. And that it also will sort of talk about the data flow, the security measures, all of that good stuff. So that’s something that data processors really need to be considerate of, and a way that they can protect themselves is through that processing agreement.
Kirill Eremenko: Okay. While we were speaking, you mentioned the terms and conditions and privacy policies. My question is in regards to these privacy policies. This is actually something we talked about before the podcast, I just wanted to clarify for the sake of everybody listening in. As a data subject, as a user, if I don’t read the privacy policy of a website or a product or a company or a contract or the terms and conditions, if I don’t read it fully, is that my fault?
Jessica Merlet: In a sense, yes.
Kirill Eremenko: So I’m responsible for that?
Jessica Merlet: Yeah, you are. None of us except for maybe the lawyers who work in data protection read the privacy policies, but it’s your responsibility. That being said, every company that has a privacy policy really needs to write it quite at low level. We don’t want to see legal jargon, for example, in the privacy policies. And in fact, we’re talking about the idea that maybe it’s better to even use a cartoon or to use a little video talking about privacy just because that’s more easily understandable and would get people to look at it more. Now, at times, and this is very important, I think for your podcast listeners that the privacy policy, you don’t always have to necessarily get that check the box consent to a privacy policy. But if you are doing processing of sensitive data, so if you’re processing data that could be about biometric data or political data, religious data, data that talks about health or sex life or sexual orientation. That kind of data, the processing of it does require that there be that affirmative consent to the privacy policy.
Jessica Merlet: So that’s something that’s very important as well for your podcast listeners, has your data controller, if your company as a data controller, or if your company is a data processor, is that able to be tracked? Has there been an affirmative consent in the privacy policy?
Kirill Eremenko: Okay, got you. So affirmative consent, that’s when somebody has to check that checkbox saying, yes, I agree to terms and conditions and the privacy policy?
Jessica Merlet: Yes.
Kirill Eremenko: And it doesn’t count if the checkbox is already pre-checked?
Jessica Merlet: No.
Kirill Eremenko: Because you see that a lot, right? Some companies that you put in your email or whatever and the checkbox is there, but it’s already pre-checked for you. Well that’s not affirmative consent, that’s just passive consent.
Jessica Merlet: Yes, that’s exactly correct.
Kirill Eremenko: Interesting. So these things are important to know like the distinctions that some types of data, sensitive data, which you mentioned sex data, sexual orientation data, religious data, biometric data require that affirmative consent?
Jessica Merlet: Yes. Actually, Kirill, if I can interrupt for just a minute? I want to make it very clear for your podcast listeners also that there’s three categories that witness affirmative consent is necessary. The first one is if you’re processing that sensitive data. There’s several categories of sensitive data. Not categories, but areas of sensitive data. The second one is if you’re doing automated decision making. Here’s where our data analytics comes in, of course. If there’s automated decision making or decision making that doesn’t have necessarily a human oversight to it, AI for example, affirmative consent or explicit consent is really what it’s called is required. And then also if we’re transferring the data to a country that’s not adequate, most countries are not adequate. So if there’s a data transfer that’s going from Europe to China for example, or to wherever it may be, also that affirmative consent or that explicit consent is required.
Kirill Eremenko: Got you, okay. That’s really cool. Thank you for clarifying. So there’s three situations or categories when affirmative consent is required. First is the sensitive data when you’re collecting sensitive data. Second category is when you are performing automated decision making, which ties into the analytics stuff that we’re going to be talking about just now. And the third category is if we are transferring data to a country that is not adequate. Okay, got you. All right. I think that clarifies quite well the whole situation or it gives a good overview of the storage. So far we’ve talked about capture and storage. And finally we get to the fun part, the processing of data to extract the insights.
Kirill Eremenko: And what really surprised me here was when we were chatting about this podcast like a week ago and we were, you mentioned that one of the main reasons that GDPR came around is not actually to do with the data itself, but it’s actually what is done with the data. How companies are now analyzing data, what analytics is being applied. And a lot of the legislation within or a lot of the sections of the GDPR actually apply not to just the storage or the capture of data, but what is allowed, what you’re allowed to do when you’re processing the data and what you’re not allowed to do when you’re processing the data. So let’s get started on that, but what’s the bird’s level, bird’s eye overview of processing and GDPR?
Jessica Merlet: Sure. I think it’s important that we make a distinction between the data controller and the data processor because those are two separate concepts although they can also be one thing.
Kirill Eremenko: Just to clarify, when I was saying processing just for our listeners, I meant any kind of analytics that we’re doing. This is probably different, like we’re using different terminology here. So let’s stick to yours. So yeah, you’re right, we have controller and processor.
Jessica Merlet: Yes. The controller is who is collecting the data, and who is determining what gets done with the data. The controller is usually your first company involved. They collect the data and they say, I’m going to give it to this processor to do the data analytics on. The processor under the GDPR, we define it as a natural or a legal person, a public authority, an agency or any other body that processes data on behalf of that data controller. So on behalf of the company that’s collecting the data. So it can be one in the same.
Kirill Eremenko: And the processor could be the controller?
Jessica Merlet: Exactly, the processor could be the controller. The processor if the processor is also doing some collection of data, the processor there could be a co-controller with the data. And also if the processor is making decisions itself. An example that we’ve had is if an outside marketing company is hired by company A, so company A hires company B outside marketing company. Company B, the marketing company does data analytics to decide maybe what products should best be direct targeted or directly marketed to that consumer. Well, if the company B, the analytics company is also making the decision as to not just tell company A here’s the results, but it’s really undertaking to then perform something else. So is it then doing the marketing campaigns? Does it do the analytics and the marketing? Does it make some determination as to the outcome of what is done with that analytics or what’s done with that data?
Jessica Merlet: Well, then it also maybe is going to be a data controller. So it’s important that we say there’s two separate concepts. But in practice, they are also able to be at points different parts of that. So one could be a controller here, one could be a processor here.
Kirill Eremenko: Okay, understood. You have two entities or there’s two roles basically, controller and the processor of data. As data scientists, we use different types of algorithms, different types of machine learning, insights, AI, deep learning, just business intelligence, lots of different approaches we have to analyze data. What does GDPR say about what is allowed and what’s not allowed for us to perform on the data?
Jessica Merlet: So again, you’re only allowed to perform what’s necessary, what goes back to that legal basis. It’s not as though the analytics company is not allowed to take just any data set that’s been given to it. And I’ll get back to that. But it’s not allowed to just take anything and perform any analytics at it at once. It’s only allowed to do so to the extent that has been disclosed to the data subject, to the extent that has been told it should do so by the data controller. And that’s very important that the data processor really sort of acts at the whim of the data controller and isn’t out there doing analytics for various other things. But it’s also important here to keep in mind when does data become de-identifiable? We’re of course only talking about data that is identifiable to a certain data subject. So if the data set cannot be identified, then that gets us into a different concept.
Jessica Merlet: However, the GDPR is very concerned, especially when we start combining data sets that if two data datasets out there in the world can be confined to re-identify that data, not just necessarily by that data processor. But if two data sets out there in the world can be combined to re-identify it, then we still have to act in compliance with the GDPR. We still have to only process the data for a specific legal basis. We still have to process it at the whim of the controller. So that’s something that’s very important. We like to think that data can be de-identified, but the GDPR and what we’re finding is that a lot of data can of course, especially when we get AI involved, can be re-identified.
Kirill Eremenko: Okay. Very interesting. So you actually touched on very important points I think for everybody listening. Is that indeed of course, as you mentioned, if data can be re-identified, then that needs to still comply as if it hasn’t been de-identified, we have to treat it according to the GDPR. But does that mean also that if we have properly de-identified data that there’s no way that it can be re-identified, it’s just basically, I don’t know, like geographical locations of customers with the latitude and longitude and nothing else in this specific data set. Does that mean we don’t have to apply GDPR to it and we can do whatever we want with it?
Jessica Merlet: Again, it really kind of comes back to cannot at all be combined with another data set. We examined at the course the fact that they were talking about farmers who had received subsidies for example. But they said, well, these are the farmers that have received subsidies. And when you look at actually how many farmers there were based on the geolocation, there’s only one potential farmer that farms wheat that’s located in Czech Republic. We really, really want to be careful as to whether at all out there in the world it can be re-identified. And of course, I think that of course as more data sets become available, as this grows and grows, and grows, then it’s going to be difficult to actually say whether or not the data eventually can be re-identified.
Jessica Merlet: But I think the goal is really just we want to try our best. And as data analytics persons and companies, keep that mind that you have data sets out there that can be combined. If you don’t then, you can do a bit more. But if you do, then we want to be a little bit more careful as to what we’re doing.
Kirill Eremenko: Okay. Okay. Understood. And speaking of data scientists, now that we’ve gone into the processing part. As a data scientist, let’s say I’m working for organization X. And I am working with a certain data set. Am I responsible, am I legally liable under GDPR for what happens to that data or is it the organization that takes the responsibility from me and is responsible on my behalf?
Jessica Merlet: Yeah. That gets us more into a concept of employer liability. It depends on what your relationship with that employer is. Are you an independent contractor, are you an employee, what have you? The ultimate responsibility a lot under GDPR does rest on the actual data controller. So that’s sort of your first line. But also the data processor is the company that’s in charge of showing that it has, for example, sufficient security measures in place. So both of the companies can be liable. As far as your personal liability, that depends on, again, on this concept of employer liability.
Jessica Merlet: So is your employer going to turn around and sue you? Potentially, if you make an egregious error. But if you simply failed to comply with GDPR, probably not. Of course, I can’t say yes or no. If you’re an independent contractor for example that’s been hired out, you have your own small data analytics company, you’re hired to come do some work for a processor. In that sense, you may have some more liability. But in terms of the fines under GDPR and whether you’re going to be responsible under GDPR to a data protection authority, that’s at this point a little more of a far leap. I mean, the data protection authorities, they don’t want to be issuing fines necessarily.
Kirill Eremenko: 50 million euro fines.
Jessica Merlet: Well, that’s for Google. But what we’re seeing is that it’s more that the protection authorities say they just want compliance. So the first thing they’re probably going to do is be sending a letter, a warning letter. They may halt some processing activities, that is an important thing that very well is happening and may happen that processing has to stop because it’s not being done in a compliant manner. Those are the steps that are usually going to be taken before a fine is issued. Although, I think that your podcast listener should be aware the fines are quite hefty. It’s 4% of annual turnover or 20 million euros are your minimums there that are available to the data protection authorities if they want to go with that.
Kirill Eremenko: 4% of annual turnover or 20 million euros, the minimum of the two?
Jessica Merlet: Mm-hmm (affirmative).
Kirill Eremenko: Wow, that’s insane. So not even profits, it’s like revenue. That is outstanding and, okay. The comments that you mentioned just for our listeners out there, if you’re a freelancer, those really do apply to you because you’re not an individual. Is this correct, Jessica that when you’re a freelancer, you’re like an independent contractor, kind of the same principles apply to you?
Jessica Merlet: Yeah. Freelancer, independent contractor, it’s exactly the same thing.
Kirill Eremenko: Okay. Got you, got you. All right. Very, very interesting. And I wanted to specifically talk about AI, artificial intelligence. You mentioned there’s some specific requirements within GDPR relating to artificial intelligence, and we’re seeing more and more companies adopt AI. It’s a very powerful technology, but it’s also very fresh, very new in the world. What does GDPR say about artificial intelligence?
Jessica Merlet: Well, GDPR is very concerned with is data being processed by regular and systematic processing. Is it being processed where there is some kind of profiling, where they’re scoring, where there is location tracking, for example? Or is there behavioral advertising, is there monitoring? Is there anything like this that’s being done without human oversight or by automated means? The concern with GDPR, one of the concerns with GDPR is that we’re going to see companies making decisions that have a legal effect on an individual because AI has come in and said, this is this individual’s or this group of individuals’ consumer habits. For example, we did a case study where we’re talking about whether or not we’re using a health and fitness club.
Jessica Merlet: Does the health and fitness club have a membership program, or does it have a bracelet that the members wear that does their tracking to say that they spent three hours here, they spent 30 minutes in the sauna area, and then they go sit at the smoothie bar for 45 minutes and have a chat. And that data may be processed and that data may be processed through automated means to determine, for example, if we think that that person is wealthy or we think that that person is poor. That’s a legal effect. That’s something that the GDPR wants to avoid if it’s being done simply by automated means. So that’s a big concern. We want to see that there’s some kind of human oversight if possible. And if there’s not human oversight, really to take a hard look at what are the end results of this. And has there been, is the data being processed based on a consent or based on transparency and accountability to that data subject? Do they know that decisions are being made about them based on automated means?
Kirill Eremenko: And what do you say about the whole concept of AI being a black box? With fusing AI and technology such as deep learning, we often come across situations where we cannot actually explain what the AI is doing. We have inputs, then there’s a neural network, maybe some reinforcement learning along the way. And then there’s an output, and that’s it. We have our inputs, we have all outputs, which are fantastic, help us market better to the customers or segment them better. But we don’t know what the AI is doing with the data. What does GDPR say about that?
Jessica Merlet: Well, it says that we have to be accountable. If the data protection authority comes to you and knocks on your door and says you’re using AI, the data protection, I mean, we have to be able to tell them how the algorithm works. We have to be able to pull out an individual’s data possible. We have to be able to modify that individual’s data. We have to be able to tell the protection authority really more than anything else how that data was used, what was it used for? We don’t necessarily have to give the algorithm over, of course, that’s a trade secret. But we have to be able to be accountable. We have to be able to show that we have some concept of what’s going on, and that we can at least give an explanation as to what data was put in and how it was used. And that’s the main goal more than anything else. I don’t personally think that’s incompatible, AI and the GDPR. It’s more just, are we transparent? Are we accountable? What do you think? Do you think that they’re compatible?
Kirill Eremenko: Good question, I find it quite a gray area and I think ultimately, they can be compatible. But at the same time, there’s a lot of concern. And that’s why one of the biggest trends, Hadelin and I did a podcast at the start of the year on the trends for 2019. And one of the biggest trends is explainable AI because companies want to verge on the side of caution and want to rather, maybe sacrifice a little bit of their efficiency in their artificial intelligence, but at the same time be able to explain what is going on in. And I’ll give you a specific example that Ben Taylor shared with me. I think it was Ben Taylor, it’s an example of when … It also involves this whole notion of accountability that you mentioned. So for instance, we have a data set where, let’s say a government authority is giving out fines to people for their driving.
Kirill Eremenko: We know that potentially people can be biased whether based on race or ethnicity, gender. And speeding fines or other types of fines can be given out in a biased way. So we want to replace it with an artificial intelligence, this whole process to make it unbiased. We set up an artificial intelligence and deep learning algorithm inside with a neural network. And then we trained it on all the pos data. But guess what, all that data that we have with the certain circumstances described in a digital way and the outcome whether the fine was given or not, that data is already biased. It already inherently contains that racial, for example, bias. The deep learning or the AI that you create is going to be by default already biased because it was trained on bias to data.
Kirill Eremenko: And then if you launch into production, even though you might say that we’re using an artificial intelligence and therefore we don’t have that human bias and we’re not a racist in the way we performed this task. The AI can actually be racist because it was trained that way. And then the notion of accountability comes in because you have an AI, which is a black box and you ultimately don’t understand what’s going on. The outcomes are racist, and therefore you are now accountable as a business for that whole thing. So it’s kind of like a give and take, and it’s an interesting combination of these two factors. On one hand, you have extremely powerful tools such as artificial intelligence. On the other hand, you do need to know how to use them properly, otherwise you get into a lot of trouble.
Jessica Merlet: Yeah. And you said that you’re finding that you are running across companies there that once you are on the side of doing what they can to be accountable and to be transparent, I think that’s great.
Kirill Eremenko: Yeah. I wouldn’t say I’ve met a lot of these companies, but based on the trend itself, that explainable AI is more and more talked about and considered. I think that’s the reason. That’s where the world’s going at this stage.
Jessica Merlet: And that’s where the GDPR and the law also wants to see that the world is going, that we don’t just have things running around out there and the processing being done as much as they can to every possible extensive that it can be doing, that it can’t be done without being able to say why it’s been done and what the outcome is. That’s really the one that went ahead is that things can be done so long as everyone is on the same page and agrees with them.
Kirill Eremenko: Yeah, totally, totally agree. Jess, this has been an extremely exciting podcast, and we’re slowly coming to the end. I’d love to talk more about a ton of other questions, but I think an important point for us to cover here is at some point we’re going to have to wrap up. So an important point is where companies and anybody listening can get more additional information. And that ties into the whole notion of a data protection officer. So GDPR requires that companies that are compliant or that are full under GDPR that they have a data protection officer. Tell us a bit about that. What is a data protection officer, and why do companies need one?
Jessica Merlet: Sure. Under the GDPR, the data protection officer is a new position, if you will. The data protection officer is really the individual that is the flag bearer, I guess you would say for the company that shows its accountability. If a data subject comes and says, hey, company, I want to know every single piece of data that you have on me. I want to know every processor you shared it with. I want to know what those processes are doing with it. I don’t like that you’re using this processor, I want you to tell them to delete my data. Whatever it is, the data protection officer is the individual that is really the point person for dealing with all of that. And is also the person that’s responsible for dealing with the data protection authorities.
Jessica Merlet: Every country, every member state in the EU has a data protection authority, so is the individual that you are responsible for dealing with those. It is mandatory to have a data protection officer under a few circumstances. So one of those is if the processing is being carried out by a public authority. The second one is if there is a regular and systematic monitoring of individuals as the core of the processing activities. So again, here we have the data analytics come into play. Or if there is sensitive data that is at the core of the processing activities where there’s large scale processing. So those are the times that companies are required to have a data protection officer. It’s somebody that can be in-house, somebody that can be just hired on retainer. If you’re a small company, whatever it may be. But that is really the role of of the the DPO.
Kirill Eremenko: Got you. And the data protection officer needs to be based in Europe, is that correct?
Jessica Merlet: The data protection officer does not need to be based in Europe but does need to be well trained and well versed on the GDPR, and needs to have a pretty solid understanding of it. There are certification courses available, and there is training. My company as well offers on retainer, DPO assistance as well for companies that may not need or have the resources to hire their own person in-house.
Kirill Eremenko: And actually, congratulations on that. You mentioned this to me just before, I think it’s a big step. And I think it’s actually needed. So with this new legislation coming out, there’s so many businesses out there, especially in the small to medium size enterprises that just don’t have that presence or don’t have the budget to train up somebody who’s going to be capable to be their data protection officer. And a company such as yours where they can get that person on retainer and be confident that everything’s going to be done well. That’s just like a lifesaver. For anybody listening out there, highly recommend if you need a data protection officer, then get in touch with Jessica and she can help you out, set you up or at least provide you the right guidance and point you in the right direction. So thank you very much for mentioning that Jessica.
Jessica Merlet: Thank you.
Kirill Eremenko: Awesome. Okay. On that note, I think we’re going to wrap up. This has been a fantastic podcast. Before I let you go, what is the best way for our listeners to contact you, get in touch with you, Jess?
Jessica Merlet: Sure. So I think LinkedIn is probably the easiest. I think my name will probably be there in your show notes, but it’s Jessica Merlet, M-E-R-L-E-T. Or you can also send me an email at info@merletlaw.com.
Kirill Eremenko: Got you, awesome. Well, thank you so much Jess once again for coming on the show and sharing all of these amazing insights. I’m sure it’s going to be super valuable for those of our listeners out there that can’t wait to soak in all this knowledge and actually enhance their careers with it. Thank you so much.
Jessica Merlet: Thank you.
Kirill Eremenko: So there you have it ladies and gentlemen. That was Jessica Merlet, the founding and principal attorney of the law office of Jessica Merlet and its European counterpart, Merlet Legal Consulting. I hope you enjoyed this episode as much as I did. There was so much going on, so many different aspects of GDPR and data privacy that we talked about. It’s really hard to even pick my favorite one. Probably all of these things combined, the whole notion that it is important for data scientists, data analysts, businesses that actually use data to keep these things in mind, to make sure that they’re treating their data or the customers’ data properly and looking after it and that they do take that responsibility.
Kirill Eremenko: And all the insights that Jessica shared with us today are definitely going to be helpful for us to stay on track with that. If you’d like to find out more information or get in touch with Jessica, then make sure to head on over to www.superdatascience.com/237. That’s www.superdatascience.com/237 where you will get all of the links and materials mentioned in the show, the transcript for the episode, the cheat sheet that we’ve prepared for you with a summary of everything that we talked about today. And of course, the URL to Jessica’s LinkedIn and her email where you can contact her.
Kirill Eremenko: Don’t forget that Jessica has specific services tailored for GDPR compliance and helping set up a data protection officer for startups, small to medium enterprises, and any kind of business that needs assistance in that space. And of course, if you need some legal advice or you want to just hit up Jessica with some questions, make sure to connect with her on LinkedIn and stay in touch. And finally, if you enjoyed this episode and you know somebody who has questions about GDPR or you know a business owner or an executive order or a director, somebody who you think might benefit from the information that we discussed today, that was shared on this episode today, then don’t just keep it to yourself. Share it with them, send them a link to this podcast.
Kirill Eremenko: The best thing to send is www.superdatascience.com/237. That’s where this episode is available plus all the show notes. And they can get to these insights as well. So if you know somebody that can benefit from this, then make sure to send them this link, www.superdatascience.com/237. On that note, thank you so much for being here. I really appreciate your time, and I hope we delivered on our promise of amazing podcasts. And I look forward to seeing you back here next time. Until then, happy analyzing.