Privacy issues of chatGPT: This is what's ahead of us

(This blog was updated following the adoption of the AI Act)

Italy’s recent ban on ChatGPT has drawn much media attention and a fair amount of criticism from technology enthusiasts. Last week, the European Data Protection Board (the EU institution where all privacy watchdogs sit) set up a task force on the ChatGPT case, and things got even more interesting.

(Update: as of April 29, ChatGPT is again available in Italy. The Italian data protection authority hasn't published any new decisions on ChatGPT yet, but made a press release available)

This task force might be a big deal. The legal issues raised by ChatGPT are not unique: in fact, most of them are common issues for generative AIs. Given the EDPB’s involvement, the ChatGPT case will likely significantly impact the future of generative AIs in the EU. So let’s see what happened exactly and what legal issues are at stake.

Let’s dive in!

The story so far

On March 30, after an own volition investigation, the Italian data protection authority (GPDP) published an urgent decision to provisionally block ChaptGPT’s activity on the Italian territory. The authority later announced that it was in touch with ChatGPT owner Open AI and discussed possible ways to make ChatGPT GDPR-compliant.

On April 11, the GPDP published another provisional decision about ChatGPT. The decision ordered OpenAI to implement several compliance measures and promised that the ban would be lifted should the company comply by April 30.

The second decision is not a green light for ChatGPT. The first decision resulted from an urgent procedure rather than an in-depth investigation. The GPDP can further investigate ChatGPT’s data processing and issue new decisions, should they be needed.

Finally, on April 14, the EDPB announced that it set up a task force to deal with ChatGPT’s case. The task force will work to find common ground between authorities on the legal issues raised by the ChatGPT case. Because data protection authorities themselves are involved in the task force, its work will impact how future cases are handled throughout Europe.

Update: the ban has been lifted since April 29. The GPDP stated in a press release that OpenAI managed to meet some of its demands, including the implemention of systems to meet requests to opt out of the processing of the data for traning the AI model, as well as requests to erase inaccurate data. Others requirements are still to be met, including the implementation of a more robust age verification system.

The GDPD also notes that its investigation of ChatGPT is still ongoing.

What are the legal issues with ChatGPT?

The GPDP’s decisions are quite succinct, which is standard for urgent procedures. So we will look at the issues pointed out by the GPDP from a broad perspective and see what they mean for generarive AIs in general.

Before we dive into the thick of things, keep in mind that ChatGPT processes data from two categories of people (or data subjects in the legal jargon). ChatGPT was trained and is constantly re-trained on both its conversation with users, and a larger database of data that were scraped from the Internet until 2021 (this can be read between the lines of OpenAI's own FAQs).

This scraped database is where the really big problems come from because the data belong to millions of people who have nothing to do with ChatGPT at all. They did not give consent or read a privacy policy- and yet their data is used for training. This is difficult to justify under the GDPR.

Legal basis

The main issue at stake is the lack of a legal basis. As our blog explains, if you process personal data, you need a legal basis under the GDPR- essentially a legal justification.

Data from users is not a big deal because you can simply collect consent. OpenAI failed to do so in the past, but they fixed the issue after the ban was lifted.

The real problem is personal data from everyone else- and by everyone else, we pretty much mean the world at large.

Based on the second decision from the GDPD, we believe the GPDP is looking into the legitimate interest of the data controller- that is, OpenAI's interest in providing ChatGPT- as a possible solution. But legitimate interest is a tricky tool under the GDPR because it requires the data controller to ensure that the processing is fundamentally fair- if necessary, by implementing safeguards. These requirements are not trivial when dealing with a black box AI such as ChatGPT, so it will be interesting to see what solutions OpenAI comes up with.

Transparency

The GPDP pointed out that ChatGPT did not provide privacy notices. Again, this is easily fixed for ChatGPT users and not-so-easily fixed for everyone else. As the GPDP pointed out, OpenAI will likely need to engage with the media for a large-scale information campaign.

But what about all the other generative AIs? Should they all do the same? As silly as it sounds, should we expect a future where half the advertising spaces on newspapers are taken up by privacy notices for some new generative AI?

Exercising data rights

Privacy notices are important. They tell you that you have rights such as accessing your data or having it erased, and they explain you how you can exercise them. In its second decision, the GPDP ordered OpenAI to provide the data subjects with a way to exercise these rights. This will not be trivial, especially regarding the millions of non-users whose data are being processed.

A somewhat similar problem surfaced in the pre-GDPR era when people started asking Google to de-reference their personal data from Google Search. This is how we got Google Spain, a landmark ruling from the EU Court of Justice that strengthened the right to erasure in EU privacy law.

Strict enforcement of the right to erasure and other data rights could alleviate some of the privacy issues raised by AIs. But with Google Search, you can just type your name and see what comes up.

Things are not so simple with generative AIs. You have no way of knowing that an AI hallucinated and provided wildly inaccurate information about you to someone on the other side of the world.

This issue is made even more complicated by the very broad notion of personal data under the GDPR. Preventing ChatGPT from using and outputting direct identifiers (names, forum nicknames, and so on) is not enough to solve these issues- even if we (boldly) assume that no one will be able to jailbreak ChatGPT in the first place. In fact, the notion of personal data is so broad, that it would be a nightmare for OpenAI to even figure out what data constitute personal data within ChatGPT's immense database- which it needs to do to honor requests under the GDPR.

Minor authentication

The GPDP pointed out that OpenAI failed to implement age authentication for users, which allowed minors of 13 to use the service and potentially be exposed to age-inappropriate content. This is probably not too relevant to AIs in general, but it’s still worth mentioning for the sake of completeness.

Does the AI Act help with these issues?

ChatGPT raises several legal issues, and it will be interesting to see how the EDPB task force will handle them (update: as of December 2023, no documents have been published by the task force).

But, of course, the burden of regulating AI in the EU is not on the EDPB alone. After long negotiations, the AI Act was finally adopted. It provides an extensive set of rules for AI, including data quality standards and risk management rules.

However, the AI Act is not a privacy law, but rather a market regulation law. Its main focus is regulating the EU market for AI products by prescribing common standards across Member States. Some of its provisions might strengthen privacy, but that is a side effect of the Regulation rather than its main goal.

Bottom line, the AI Act will likely help to some extent, but it is no "AI GDPR", so to speak. At the end of the day, the GDPR is still be crucial for AI. And in some ways, this is a problem.

We took a deep dive into ChatGPT’s issues with the GDPR, but at the end of the day, the problem is not about the technicalities. AI simply does not fit within the conceptual categories of the GDPR. This makes the privacy issues with AI really, really difficult to solve.

The accuracy principle is a good example. Under the GDPR, personal data should be accurate and kept out to date. How do you apply this principle to an AI model that hallucinates left and right and was trained with inaccurate, outdated data? What does the principle even mean for generative AI?

We should not give AI a free pass under the GDPR. But we should acknowledge that the issues at play will not be solved the way issues with GDPR interpretation are usually settled- with some EDPB guidelines here and a Court of Justice precedent there.

The AI Act didn't help either, due to its narrow focus on market regulation.

TL;DR: there is a dangerous gap in European privacy law and it probably won't be filled anytime soon.

Conclusion

We at Simple Analytics believe that privacy matters. This is why we strive to explain privacy news in an accurate and accessible way. We believe there will be no privacy-friendly future without a privacy-aware public.

We also believe that we can all contribute to the cause of privacy. This is why we build a web analytics tool to provide you with all the insights you need, without collecting personal data and tracking visitors. Privacy is our absolute priority, which is why Simple Analytics is built to do more, with less. If this sounds good to you, feel free to give us a try!