Google Analytics servers side implementation

The future of analytics is a topic of debate that has gotten more attention over the last few months. This results from a European-wide supported notion that Google Analytics violates GDPR law.

The French data protection authority (CNIL) openly questioned the legality of using Google Analyics and mentioned a list of privacy-compliant options for organizations to evaluate. One of those is server-side implementation of Google Analytics.

However, server-side implementation is not without drawbacks. In this article, we will outline whether server-side implementation of Google Analytics complies with GDPR.

Let's dive in!

💡 Oh one more thing, you can skip all of this by choosing a privacy-friendly Google Analytics alternative like Simple Analytics. No cookie banner required. 100% GDPR-compliant and still get the insights you need. Feel free to check it out.

Google Analytics client-side tracking vs server-side tracking
Advantages and disadvantages of server-side tracking
Google Analytics server side vs Legal issues
What data need to be anonymized?
Google Analytics server side performance
Google Analytics server side vs Compliance
Privacy implications of server-side analytics
Final Thoughts

Google Analytics client-side tracking vs server-side tracking

Client-side tracking and server-side tracking are different ways of collecting and processing data about user behavior.

Client-side tracking (or client-side tagging) collects information using scripts that run within the user's browser, such as cookies or pixels.

Server-side tracking (or server-side tagging), on the other hand, collects the data from the server by logging and analyzing requests. This allows the data to be collected without interacting with the user’s device.

In Google Analytics’s case, server-side tracking is a little different. Google Analytics still interacts with the user’s browser by writing and reading cookies. However, the data they collect are sent to your server instead of Google. The server administrator can then decide which datapoints are forwarded to Google, and how. So the server essentially acts as a proxy for the data.

Advantages and disadvantages of server-side tracking

Server-side tracking gives you more control over the information that is sent to your analytics provider. You can decide whether to send personal data at all, and whether to anonymize, pseudonymize them, or send them in the clear.

There are other advantages to server-side implementation. Your site will load a little faster because the analytics script do not need to be loaded by the browser. This improves the user experience and can help with the ranking on search engines.

In addition, your analytics are not negatively impacted by adblocking software because they no longer depend on the interaction with the user’s browser settings.

The main drawback of server-side setups is the burdensome implementation. You need to find a server if you don’t have one already and keep it safe against cyber threats. You need to set up a user interface to make the data from the server log readable, and find a way to reliably filter out noise. You must also manually update the code every time your analytics software gets an update.

Setting up Google Analytics server-side migh cost you more than switching to a privacy-friendly alternative. In fact, the CNIL itself notes that ditching Google Analytics may be a more practical option, because of the costs of a server-side setup.

If you decide to run Google Analytics server side, you still require cookie consent from your website visitors.

Google Analytics server side vs Legal issues

Any client-side implementation of Google Analytics sends personal data to the US**. This is the core of Google Analytics’ legal issues with data transfers (which we discussed in-depth on another blog).

Server-side implementation gives the server admin complete control over the data processing and allows you to decide which personal data is forwarded to Google and which is not. In theory, you could set up Google Analytics server-side and prevent Google from accessing visitors' personal data, which would make Google Analytics compliant.

But how does this work in practice? What data should you not forward to Google to make Google Analytics GDPR compliant? And what is the cost in terms of performance?

What data need to be anonymized?

Google Analytics forwards two categories of personal data to the US: IP addresses and cookies. IPs are not a big deal because Google Analytics doesn’t really need them- in fact, Google Analytics 4 does not collect them and only uses them for communication. You can implement Google Analytics server-side without forwarding user IP to Google, with little or no impact on the accuracy of Google Analytics’ insights.

Cookies are a different story. Google Analytics’ cookies include a unique identifier called Client ID. Like IPs, Client IDs are personal data under the GDPR. You cannot withhold these data from Google, because Google Analytics needs it in order to function. And you cannot anonymize it, because the ID needs to identify an individual user in order to be useful.

We cannot stress this point enough: under the GDPR, there is no such thing as an anonymized identifier. If a piece of data was properly anonymized, then it cannot, by definition, function as an identifier anymore. This is a core point of the GDPR and no privacy-enhancing technology can change that.

So, the best you can do to preserve user privacy is hashing Client IDs. But you are not anonymizing anything: you are merely replacing a unique identifier with another.

As an extra safeguard, the CNIL suggests periodically changing hashes. The authority considers rotating hashes as a form of pseudonymization- something that falls short of proper anonymization but still offers some protection for the data. In fact, strong pseudonymization is mentioned as a possible safeguard for data transfers by the European Data Protection Board (the institution where all European data protection authorities sit).

This makes sense. As the CNIL points out, rotating hashes are still personal data, but they definitely make the user harder to identify because the ID changes periodically.

Google Analytics server side performance

Google Analytics bases its insights on detailed data on the online activity of website visitors. The more data you feed it, the better it performs. If you feed it all the data it would collect client-side, it will perform as well as a client-side setup (and possibly a little better since ad-blockers will be less of an issue). Then again, this makes server-side implementation as invasive as client-side setups, which defeats the whole purpose of implementing Google Analytics server-side in the first place.

On the other hand, withholding some data for privacy reasons will negatively affect the tool’s performance.

For instance, the Client IDs we mentioned earlier allow Google to track visitors by linking multiple events, sessions, and pageviews to the same person. If you access the same website twice, Google Analytics will read your client ID and only count you once as a unique visitor.

Unfortunately, Google Analytics cannot link the metrics to an individual visitor after their ID is re-hashed. This has a significant impact on the accuracy and level of detail of Google Analytics’ insights. For instance, after you rotate the hashes, returning users will get a new hash and will be counted as unique visitors again by Google Analytics, so your unique visitors metric essentially goes out the window.

Google Analytics server side vs Compliance

Let’s say you bite the bullet. You go through the hassle of implementing Google Analytics server-side. You take the CNIL’s suggestions to the letter: the only personal information your server forwards are hashed Client IDs, and those hashes are frequently rotated. Are you compliant with the GDPR’s data transfer rules?

Maybe.

As we explained, rotated hashes are pseudonymized data. Pseudonymization is good because it makes the identification of personal data unlikely (that is to say: it makes it hard to figure out to whom the data belong). This technique is sometimes use by Google Analytics competitors in order to preserve privacy.

But pseudonymization is not the same as anonymization under the GDPR, and for good reason: if an entity controls other data, they might be able to pool it together in order to identify pseudonymized data.

For instance, if you are active on Reddit, your Reddit username is probably a witty pseudonym. However, if you post enough information on your age, job, birthplace, and so on, it will eventually be possible for other Redditors to figure out who you are.

The age of Big Data is making this sort of data linkage easier and easier. Someone pools vast databases together, and with a little bit of AI black magic, pseudonymous data can often be re-identified.

So how safe is the personal data of your visitors after you hash them and forward them to Google?

Well, Google controls some of the biggest existing databases of personal data. It can rely on exceptional know-how and state-of-the-art technology. It also has a strong incentive to cross-link databases because advertising is its main source of revenue, and profiling is where the real money is.

Even though a visitor may not be identifiable based on their refreshed hash alone, Google could combine this data with data collected elsewhere- for instance, via a visitor’s Google account, through Google APIs, or through advertising trackers on Android devices (AAID). This is probably enough to make many visitors identifiable. This, in turn, means that hashes might still be personal data under the GDPR even if the server rotates them.

To be clear: We are not claiming that Google re-identifies pseudonymized and anonymized data, nor do we have any proof of that. Google says it doesn’t. But in our opinion, the company’s track record for privacy suggests some caution.

We are also not claiming that rotating hashes are personal data in the scenario we described. This is for courts and authorities to determine. But a case can certainly be made that they are: after all, in their decisions against Google Analytics, some data protection authorities (including the CNIL itself) acknowledged that the issue of cross-identification was relevant to the cases. This is a good reason to be wary.

Bottom line: it is unclear whether a server-side implementation of Google Analytics ensures compliance with GDPR rules on data transfers- even assuming that you take every precaution possible.

Privacy implications of server-side analytics

Server-side analytics has interesting privacy implications. On paper, it has the potential to be more privacy friendly because it allows you to decide exactly what data you want to collect and whether you want to share it with your analytics provider.

However, data collection could be less transparent. Server-side analytics lets you work on personal data directly from your server log. Your users have no idea this is happening because they can’t just open their browser settings and check their cookies.

Bottom line, transparency is key to a correct implementation of server-side tracking. Users have a right to be informed of what personal data are processed for web analytics and on what legal basis. Implementing server-side analytics in a transparent and compliant way is up to you.

Server-side analytics also has implications for consent. As we explained, Google Analytics’ cookies require consent even when the software is implemented server-side. The same goes for any web analytics software that uses cookies: all non-essential cookies require consent under the ePrivacy Directive, whether analytics are implemented client-side or server-side.

Server-side tagging also allows you to collect other data without interacting with the user’s browser. But this does not mean that you do not need consent.

Final Thoughts

All in all, server-side implementation of Google Analytics is not a viable solution. It is too expensive for small businesses to implement, causes the tool to perform worse than the competition, and does not fully guarantee that data transfers will be 100% GDPR-compliant.

The core of the issue is that Google Analytics is not a privacy-friendly tool. It is designed to collect fine-grained information by aggressively tracking visitors. Trying to implement Google Analytics in a privacy-friendly way runs counter to its very design: it is a lot of work and yields poor results.

Obviously, we are biased towards our own solution, but switching to a privacy-friendly service is easier, cheaper, and leads to better performance than implementing Google Analytics server-side.

At Simple Analytics, we believe in an independent Internet that is friendly toward website visitors. We make sure it is still possible for website owners to get the insights they need without violating the law. If this resonates with you, feel free to give us a try!