The future of analytics is a topic of debate that has gotten more attention over the last few months. This results from a European-wide supported notion that Google Analytics violates GDPR law. Countries such as France, Italy, Austria, and lately also, Finland and Norway have publicly stated that Google Analytics is unlawful.
In their statement, the French data protection authority (CNIL) mentioned a list of privacy-compliant options for organizations to evaluate. One of those is server-side implementation of Google Analytics. The CNIL is one of Europe’s most respected privacy authorities, so their suggestion got some attention from the privacy and marketing communities, and led some to believe that implementing Google Analytics server-side is a bulletproof solution to Analytics’ legal issues with data transfers.
However, server-side implementation is not without drawbacks. In this blog, we will take a closer look at it and attempt to answer two questions:
Does server-side implementation of Google Analytics comply with the GDPR? And is it worth it to implement?
- What are client-side and server-side tracking?
- What are the advantages and disadvantages of server-side tracking?
- Is server-side the solution to Google Analytics’ legal issues?
- What data need to be anonymized?
- How does Google Analytics perform server-side?
- Does server-side implementation of Google Analytics really ensure compliance?
- What are the privacy implications of server-side analytics?
- Is server-side implementation needed with privacy-friendly alternatives?
Let's dive in!
What are client-side and server-side tracking?
Client-side tracking and server-side tracking are different ways of collecting and processing data about user behavior.
Client-side tracking (or client-side tagging) collects information using scripts that run within the user's browser, such as cookies or pixels. Server-side tracking (or server-side tagging), on the other hand, collects the data from the server by logging and analyzing requests. This allows the data to be collected without interacting with the user’s device.
In Google Analytics’s case, server-side tracking is a little different. Google Analytics still interacts with the user’s browser by writing and reading cookies. However, the data they collect are sent to the server instead of Google. The server administrator can then decide which data are forwarded to Google, and how. So the server essentially acts as a proxy for the data.
What are the advantages and disadvantages of server-side tracking?
Server-side tracking gives you more control over the information that is sent to your analytics provider- whether that is Google or another company. You can decide whether to send personal data at all, and whether to anonymize, pseudonymize them, or send them in the clear.
There are other advantages to server-side implementation. Your site will load a little faster because the analytics script do not need to be loaded by the browser. This improves the user experience and can help with the ranking on search engines. In addition, your analytics are not negatively impacted by adblocking software because they no longer depend on the interaction with the user’s browser settings (although cookies from Google Analytics and other cookie-based analytics services may still be blocked).
The main drawback of server-side setups is the burdensome implementation. You need to find a server if you don’t have one already and keep it safe against cyber threats. You need to set up a user interface to make the data from the server log readable, and find a way to reliably filter out noise, which is not trivial. You must also manually update the code every time your analytics software gets an update.
Additionally, you need full access to the server log, which many server providers do not offer. This narrows down your choices if you intend on relying on a provider (which is the most affordable option for many companies).
All in all, setting up Google Analytics server-side will cost you a lot more than subscribing to a paid web analytics service that complies with GDPR. In fact, the CNIL itself notes that ditching Google Analytics may be a more practical option, because of the costs of a server-side setup.
Finally, it should be noted that cookies still require user consent, even for server-side tagging. This includes Google Analytics and any other cookie-based analytics service.
Let's dig a bit deeper, shall we?
Is server-side the solution to Google Analytics’ legal issues?
Any client-side implementation of Google Analytics sends personal data to the US. This is the core of Google Analytics’ legal issues with data transfers (which we discussed in-depth on another blog).
Server-side implementation gives the server admin complete control over the data processing and allows you to decide which personal data is forwarded to Google and which is not. In theory, you could set up Google Analytics server-side and prevent Google from accessing visitors' personal data, which would make Google Analytics compliant.
But how does this work in practice? What data should you not forward to Google to make Google Analytics GDPR compliant? And what is the cost in terms of performance?
The founders, Sergey Brin and Larry Page, hiding behind the internet
What data need to be anonymized?
Google Analytics forwards two categories of personal data to the US: IP addresses and cookies. IPs are not a big deal because Google Analytics doesn’t really need them- in fact, Google Analytics 4 does not collect them and only uses them for communication. You can implement Google Analytics server-side without forwarding user IP to Google, with little or no impact on the accuracy of Google Analytics’ insights.
Cookies are a different story. Google Analytics’ cookies include a unique identifier called Client ID. Like IPs, Client IDs are personal data under the GDPR. However, IDs must be sent somehow because Google Analytics is built around them.
Unique identifiers cannot be anonymized either, at least not in a strict sense of the word. Google Analytics’ cookies function because they are unique, and removing their unique part (the Client ID) makes them perfectly useless. The best you can do is hash them, but each hash needs to be unique to be of any use- so you are merely replacing a unique identifier with another.
As an extra safeguard, the CNIL suggests periodically changing hashes. The authority considers rotating hashes as a form of pseudonymization- something that falls short of proper anonymization but still offers some protection for the data. In fact, strong pseudonymization is mentioned as a possible safeguard for data transfers by the European Data Protection Board (the institution where all European data protection authorities sit). But there is a price to pay.
How does Google Analytics perform server-side?
It depends. Google Analytics bases its insights on detailed data on the online activity of website visitors. The more data you feed it, the better it performs. If you feed it all the data it would collect client-side, it will perform as well as a client-side setup (and possibly a little better since ad-blockers will be less of an issue). Then again, this makes server-side implementation as invasive as client-side setups, which defeats the whole purpose of implementing Google Analytics server-side in the first place. On the other hand, withholding some data for privacy reasons will negatively affect the tool’s performance.
The Client IDs we mentioned earlier allow Google to track visitors by linking multiple events, sessions, and pageviews to the same person. For instance, if you access the same website twice, Google Analytics will read your client ID and only count you once as a unique visitor.
Unfortunately, Google Analytics cannot link the metrics to an individual visitor after their ID is re-hashed. This has a significant impact on the accuracy and level of detail of Google Analytics’ insights. For instance, after you rotate the hashes, returning users will get a new hash and will be counted as unique visitors again by Google Analytics, so your unique visitors metric essentially goes out the window.
Does server-side implementation of Google Analytics really ensure compliance?
Let’s say you bite the bullet. You go through the hassle of implementing Google Analytics server-side. You take the CNIL’s suggestions to the letter: the only personal information your server forwards are hashed Client IDs, and those hashes are frequently rotated. Are you compliant with the GDPR’s data transfer rules?
As we explained, rotated hashes are pseudonymized data. Pseudonymization is good because it makes the identification of personal data unlikely (that is to say: it makes it hard to figure out to whom the data belong). This technique is sometimes use by Google Analytics competitors in order to preserve privacy- for instance, Fathom and Plausible do this (we at Simple Analytics don’t need to hash because we do not store IPs at all).
However, if an entity controls a lot of data, they might be able to pool it together in order to identify pseudonymized data. A technique called fingerprinting.
For instance, if you are active on Reddit, your Reddit username is probably a witty pseudonym. However, if you post enough information on your age, job, birthplace, and so on, it will eventually be possible for other Redditors to figure out who you are. (Yes, this example is too simple, but you get what I mean).
Cross-linking databases is the same thing, but on a wider scale: someone pools vast databases together, and with a little bit of AI black magic, pseudonymous data can sometimes be re-identified.
So how safe is the personal data of your visitors after you hash them and forward them to Google?
Well, Google controls some of the biggest existing databases of personal data. It can rely on exceptional know-how and state-of-the-art technology. It also has a strong incentive to cross-link databases because advertising is its main source of revenue, and profiling is where the real money is.
Even though a visitor may not be identifiable based on their refreshed hash alone, Google could combine this data with data collected elsewhere- for instance, via a visitor’s Google account, through Google APIs, or through advertising trackers on Android devices (AAID). This is probably enough to make many visitors identifiable. This, in turn, means that hashes might still be personal data under the GDPR even if the server rotates them.
To be clear: We are not claiming that Google re-identifies pseudonymized and anonymized data. Google says it doesn’t. In our opinion, the company’s track record for privacy suggests some caution.
We are also not claiming that rotating hashes are personal data in the scenario we described. This is for courts and authorities to determine. But a case can certainly be made that they are: after all, in their decisions against Google Analytics, some data protection authorities (including the CNIL itself) acknowledged that the issue of cross-identification was relevant to the cases. This is a good reason to be wary.
Bottom line: it is unclear whether a server-side implementation of Google Analytics ensures compliance with GDPR rules on data transfers- even assuming that you take every precaution possible.
What are the privacy implications of server-side analytics?
Server-side analytics has interesting privacy implications. On paper, it has the potential to be more privacy friendly because it allows you to decide exactly what data you want to collect and whether you want to share it.
However, data collection could be less transparent. Server-side analytics lets you work on personal data directly from your server log. Your users have no idea this is happening because they can’t just open their browser settings and check their cookies.
Bottom line, transparency is key to a correct implementation of server-side tracking. Users have a right to be informed of what personal data are processed for web analytics and on what legal basis. Implementing server-side analytics in a transparent and compliant way is up to you.
Server-side tagging also allows you to collect other data without interacting with the user’s browser. But this does not mean that you do not need consent.
Things get a little complex here, but as a rule of thumb, if the data you collect allows you to single out a user among all your visitors, then you should only collect that data with consent, as consent is very likely to be required. This is the case even if you do not actually use these metrics to single out users: the mere fact that they allow you to do so makes them personal data and, in all likelihood, makes consent mandatory.
On the other hand, you can collect some metrics without consent, provided that they do not allow you to single out a user- even when linked with other metrics. For instance, there is no harm in collecting interactions from your server and using them for analytics as long as these data do not allow you to track users.
Bottom line: if the data allows you to track, be on the safe side and ask for consent.
Is server-side implementation needed with privacy-friendly alternatives?
It depends on the service. In Google Analytics’ case, server-side implementation addresses legal issues with data transfer rules. If a privacy-friendly alternative does not forward personal data to the US, then a server-side implementation is unnecessary to comply with data transfer rules.
However, server-side analytics offers other advantages for compliance. For instance, it can allow you to redact IP addresses before forwarding them. If you are considering an alternative to Google Analytics, you should examine its legal documentation closely and consider the possible benefits of server-side implementation for that specific service.
In the specific case of Simple Analytics, server-side implementation is not needed, because we do not collect any personal data from your visitors or forward them outside the EU.
To sum it up:
- Forwarding Client IDs to Google in the clear or using static hashes are effectively the same as implementing Google Analytics client-side and do not make Google Analytics compliant with data transfer rules;
- Not sending Client IDs at all makes Google Analytics completely useless;
- Rotating hashes cripple Google Analytics’s performance and still do not 100% ensure compliance with data transfer rules, because the user might still be identifiable;
- All these options are burdensome to implement.
All in all, server-side implementation of Google Analytics does not seem like a viable solution. It is too expensive for small businesses to implement, causes the tool to perform worse than the competition, and does not fully guarantee that data transfers will be 100% GDPR-compliant.
The core of the issue is that Google Analytics is not a privacy-friendly tool. It is designed to collect fine-grained information by aggressively tracking visitors. Trying to implement Google Analytics in a privacy-friendly way runs counter to its very design. That’s why doing so is a lot of work and yields poor results.
Obviously, we are biased towards our own solution, but switching to a privacy-friendly service is easier, cheaper, and leads to better performance than implementing Google Analytics server-side. At Simple Analytics, we believe in an independent internet that is friendly toward website visitors. We make sure it is still possible for website owners to get the insights they need without violating the law. If this resonates with you, feel free to give us a try!