Data analysts and digital marketers preparing to leave UA have tons of questions about data sampling in GA4, wondering if they’ll face similar challenges as with the previous version.
So, here it is: there are significant differences in how UA and GA4 sample data and when they do it.
Reporting in GA4’s Analysis section is divided into two categories:
As with UA, your data may get sampled if you apply secondary dimensions to a standard report. On the contrary, GA4 allows you to use secondary measurements, comparisons, and filters in your report without worrying about data sampling. In this case, you’ll see a green check mark at the top.
GA4 displays a yellow sign with a % symbol on the report for sampled data. Moving the cursor to this icon will reveal the following message:
“This report is based on X% of available data.”
While standard reports in UA aren’t typically sampled, your data may be subject to the following sampling thresholds if you apply ad-hoc queries like secondary dimensions to a report:
On the other hand, GA4 always displays unsampled data on standard reports, even if you apply custom parameters, filters, and comparisons. Only the advanced reports may sometimes be sampled. This generally happens when your data is> 10 million, and the report isn’t the same as the default or standard report.
Standard reports in UA have a monthly hit limit of 10 million per property. The daily limit per user is 200K; per session, it’s 500. When these limits are reached, the platform stops collecting data. While this isn’t too worrisome for small websites, platforms with heavy traffic often face sampling trouble because of the hit limits.
Of course, there’s a solution for big businesses in the form of Google Analytics 360, but it’s expensive.
GA4, on the contrary, is free without any hit limits. This is good news for data analysts and digital marketers planning to take their data analytics game to the next level with a premium tool at no cost. However, your data needs to meet specific criteria because unsampled reports haven’t completely been eliminated.
If your data is limited, GA4 samples it, notifying you with the orange sign at the top of the report. This is opposite to what happens in UA, where data sampling occurs due to excess data.
If your website has limited users after switching to GA4, you’ll have to deal with sampled data. But there’s not much to worry about since the problem will go away once you start generating traffic.
Your reports in GA4 will be assigned specific dimensions, each with several unique values. For instance, for the gender dimension, you may see three different values: male, female, and other. Cardinality refers to the number of values for a dimension. In the gender dimension case, the cardinality will be three.
The Google Analytics system limits may create rolled-up entries for reports with high-cardinality dimensions.
The cardinality effect may be seen in standard as well as advanced reports in the Analysis tab. Generally, this happens when a dimension has more than 25,000 to 30,000 values in the date range you select.
Data sampling in GA4 doesn’t occur in standard reports found under the Reports tab on the left. This includes:
The Demographic report, however, may have sampled data to protect user privacy.
You may think of sampled data as cookieless tracking, an ethical and user-focused way of tracking visitor behaviour. When you have limited data at your disposal, GA4 enables you to identify visitors with their IDs, devices, or user-generated content fields.
Like default reports, exploration reports can be found in the menu on the left. In general, these reports are also unsampled.
The current version of GA4 displays seven explorations:
Data sampling can occur in GA4’s exploration reports if you decide to add dimensions.
The best way to deal with sampled reports is by linking GA4 with BigQuery. This simple trick will allow you access to raw data. You can then use data visualization tools like Google Data Studio to create unsampled reports.
With BigQuery, you can conduct a more granular analysis of your data with full confidence in its accuracy. Here’s what to expect:
GA4 is more reliable than UA in terms of data sampling, thanks to the absence of hit limits. This ensures that your reports will probably be based on 100% data. In other words, you don’t have to worry about working with a limited percentage of data because of the crossed limits or thresholds, like in UA.
Websites that generate great traffic particularly benefit from GA4’s unsampled reports. On the contrary, if you have a smaller website with limited users, you may have to deal with sampled data because of user privacy concerns.
Data sampling in GA4 is just one of the reasons for switching to the new platform today rather than tomorrow. You can then integrate it with BigQuery to make the most of your data!
For more information or guidance on this topic, feel free to reach out to our analytics consultants at Napkyn.
Napkyn, a Kepler Group company, is a digital analytics and media solutions provider with more than a decade of experience helping organizations implement and activate high-quality data to make superior business decisions. Trusted by Fortune 1000 companies across North America, Napkyn delivers world-class data management and enterprise enablement solutions to data-driven marketing and technology leaders.
Napkyn is a Google Marketing Platform and Google Cloud Partner, providing technology licensing and modern marketing services that inspire brands and agencies to connect, innovate, and experiment with privacy-forward digital solutions. Learn more about Napkyn at napkyn.com or by following Napkyn on LinkedIn and Twitter.