Italy’s recent ban on ChatGPT has drawn much media attention and a fair amount of criticism from technology enthusiasts. Last week, the European Data Protection Board (the EU institution where all privacy watchdogs sit) set up a task force on the ChatGPT case, and things got even more interesting.
(Update: as of April 29, ChatGPT is again available in Italy. The Italian data protection authority hasn't published any new decisions on ChatGPT yet, but made a press release available)
This task force might be a big deal. The legal issues raised by ChatGPT are not unique: in fact, most of them are common issues for generative AIs. Given the EDPB’s involvement, the ChatGPT case will likely significantly impact the future of generative AIs in the EU. So let’s see what happened exactly and what legal issues are at stake.
Let’s dive in!
The story so far
On March 30, after an own volition investigation, the Italian data protection authority (GPDP) published an urgent decision to provisionally block ChaptGPT’s activity on the Italian territory. The authority later announced that it was in touch with ChatGPT owner Open AI and discussed possible ways to make ChatGPT GDPR-compliant.
On April 11, the GPDP published another provisional decision about ChatGPT. The decision ordered OpenAI to implement several compliance measures and promised that the ban would be lifted should the company comply by April 30.
The second decision is not a green light for ChatGPT. The first decision resulted from an urgent procedure rather than an in-depth investigation. The GPDP can further investigate ChatGPT’s data processing and issue new decisions, should they be needed.
Finally, on April 14, the EDPB announced that it set up a task force to deal with ChatGPT’s case. The task force will work to find common ground between authorities on the legal issues raised by the ChatGPT case. Because data protection authorities themselves are involved in the task force, its work will impact how future cases are handled throughout Europe.
Update: the ban has been lifted since April 29. The GPDP stated in a press release that OpenAI managed to meet some of its demands, including the implemention of systems to meet requests to opt out of the processing of the data for traning the AI model, as well as requests to erase inaccurate data. Others requirements are still to be met, including the implementation of a more robust age verification system.
The GDPD also notes that its investigation of ChatGPT is still ongoing.
What are the legal issues with ChatGPT?
The GPDP’s decisions are quite succinct, which is standard for urgent procedures. So we will look at the issues pointed out by the GPDP from a broad perspective and see what they mean for generarive AIs in general.
Before we dive into the thick of things, also note that ChatGPT processes data from two categories of people (or data subjects in the legal jargon). ChatGPT was trained (and is constantly re-trained) on both its conversation with users and a larger database previously collected from the Internet. The database is where the really big problems come from because the data belong to millions of people who have nothing to do with ChatGPT at all.
The main issue at stake is the lack of a legal basis. As our blog explains, if you process personal data, you need a legal basis under the GDPR- essentially a legal justification.
Data from users is not a big deal because you can just collect consent (OpenAI failed to do so, but this can be easily fixed). The real problem is everyone else- and by everyone else, we pretty much mean the world at large.
According to OpenAI’s FAQs, ChatGPT was trained on “vast amounts of data from the Internet written by humans, including conversations.” The FAQ suggests that ChatGPT does not scrape the Internet now but did until 2021 (or at the very least, that it was fed with scraped data until that year). Bottom line, ChatGPT could be processing personal data from anyone who wrote content on a publicly accessible web page until 2021.
That’s a lot of personal data and a big responsibility for Open AI. It is not easy for a company to find a legal basis for processing tons of data from people who have nothing to do with its services. This is why legal bases are a big problem for generative AIs in general.
What could the solution be? Consent is obviously out of the question given the number of data subjects involved. So is the legal basis of contract since most data subjects do not use Chat GPT themselves.
Based on the second decision1, we believe the GPDP is looking at legitimate interest. Legitimate interest is a tricky legal basis because it requires the data controller to ensure that the processing is fundamentally fair- if necessary, by implementing safeguards for the rights of the data subjects. These requirements are not trivial when dealing with a black box AI, so it will be interesting to see what solutions OpenAI comes up with.
The GPDP pointed out that ChatGPD did not provide the data subjects with privacy notices. Again, this is easily fixed for the users and not-so-easily fixed for everyone else, because OpenAI needs to reach a massive audience. As the GPDP pointed out, OpenAI will likely need to engage the media for a large-scale information campaign.
But what about all the other generative AIs? Should they all do the same? As silly as it sounds, should we expect a future where every other newspaper ad is a privacy notice for some AI?
Exercising data rights
Privacy notices are important because they tell you what your data rights are (for instance, accessing your data or having it erased) and how to exercise them. In its second decision, the GPDP ordered OpenAI to provide the data subjects with a way to exercise these rights. This will not be trivial, especially regarding the millions of non-users whose data are being processed.
A somewhat similar problem surfaced in the pre-GDPR era when people started asking Google to de-reference their personal data from Google Search. This is how we got Google Spain, a landmark ruling from the EU Court of Justice that strengthened the right to erasure in EU privacy law.
Strict enforcement of the right to erasure and other data subject rights could help alleviate some of the privacy issues raised by AIs. But with Google Search, you can just type your name and see what comes up. Things are nowhere near as easy with an AI.
Let’s say you request OpenAI to access your personal data. ChaptGPT will first need to retrieve all your personal data from the dataset. The GDPR’s definition of personal data is quite broad, so retrieving your data will require more than just filtering the dataset by your name or other identifiers (say, a forum username). More sophisticated technical approaches will be necessary, and in all likelihood, there will be no guarantee that ChatGPT will accurately retrieve all your personal data.
Should we just assume that if an AI as advanced as ChatGPT cannot recognize certain data as personal data, then not treating them as such is safe enough in practice? This pragmatic approach does not sound too bad and might even make sense from a legal point of view2.
But ChatGPT gets smarter by the day and constantly expands its dataset by talking to its users. Just because it cannot recognize certain data as personal data today, does not mean it will not be able to do so tomorrow. Should the data subjects forward access requests every day, just to be safe? Should OpenAI periodically scan the dataset and update every single data subject who filed an access request in the past?
The right to have your data corrected and updated looks problematic too. All of the data within the original training dataset is outdated by two years or longer by now, which is not a good start.
Additionally, both input and output data can be personal data. This means that you have a right to an accurate output with regards to your personal data. But how would you even find out that someone, somewhere, learned inaccurate information about you through ChatGPT? And how can OpenAI ensure that ChatGPT’s output is accurate when it changes all the time, even in response to identical queries?
Request authentication is also going to be a puzzle. If someone sends you a request to access their data, you must comply. But you also need to ensure that the request came from the actual data subject to avoid disclosing their personal data to someone else. Authenticating a request can be tricky, even more so when the data subject has nothing to do with the service you provide (and cannot be required to prove their identity by providing known information, such as login credentials). OpenAI might have to deal with many such requests soon, and it will not be a walk in the park.
The GPDP pointed out that OpenAI failed to implement age authentication for users, which allowed minors of 13 to use the service and potentially be exposed to age-inappropriate content. This is probably not too relevant to AIs in general, but it’s still worth mentioning for the sake of completeness.
Will the AI Act help with these issues?
ChatGPT raises several legal issues, and it will be interesting to see how the EDPB task force will handle them. But, of course, the burden of regulating AI in the EU is not on the EDPB alone.
The EU is working on a Regulation proposal known as the AI Act. The draft provides an extensive set of AI rules, including data quality standards and duties to manage risks. Will the upcoming Regulation help with some of the privacy issues raised by AIs?
It likely will, to some extent. But it will not be the silver bullet.
The AI Act is no GDPR for AI, so to speak. It’s really not a privacy law: its main focus is regulating the EU market through common security standards for AI products. Some of its provisions might strengthen privacy, but that is not its main purpose.
Additionally, the strictest obligations under the Act are reserved for specific types of high-risk AI systems, which do not include generative AI under the current draft.
But in the near future, the European Parliament may push for a revision of the AI Act draft in order to include generative AIs in the high-risk category, as reported by Euractiv. The risk classification system is one of the most contentious points of the draft Regulation, and the ChatGPT case certainly had an impact on the Parliament's change of heart.
Update: the European Parliament reached a provisional agreement on a new draft of the AI Act proposal. The new proposal classifies generative AIs such as ChatGPT as high risk systems.
Regardless, we should not expect the AI Act to solve all the privacy issues raised by AIs. The GDPR will still be crucial in this regard, which makes the work of the EDPB task force all the more important.
We at Simple Analytics believe that privacy matters. This is why we strive to explain privacy news in an accurate and accessible way. We believe there will be no privacy-friendly future without a privacy-aware public.
We also believe that we can all contribute to the cause of privacy. This is why we build a web analytics tool to provide you with all the insights you need, without collecting personal data and tracking visitors. Privacy is our absolute priority, which is why Simple Analytics is built to do more, with less. If this sounds good to you, feel free to give us a try!
- #1 The GPDP mentions that the data subjects, including non-users, should be allowed to object to the processing. This is a good hint because data subjects only have a right to object when the processing is based on legitimate interest, and in other specific situations (see Art. 21 GDPR).
- #2 A plausible case could be made that in this scenario, the data are not really personal data under the GDPR. The notion of personal data under the GDPR is context-based, which allows for some funny reasoning. If you’re curious, see Recital 26 GDPR and the gdprhub commentary