How To Solve Google Analytics Transaction Hit Caps Using BigQuery (And Data Studio)
April 10, 2018 -
By Pat Cooney, Implementation Specialist and Training Practice Lead & Hollis Bowman, Senior Analyst and Google Analytics 360 Practice Lead
We have a client who approached us because they were having some normalization issues between what they were seeing in Google Analytics and what was being recorded in the back office data. Some investigation showed us that transactions containing a large number of products were missing in Google Analytics (GA) altogether. Once we saw this, we were able to quickly determine the nature of the problem: These transactions were exceeding the 8kb data cap for GA hits.
Needless to say, our client turned to us for recommendations on how to resolve this issue. A sound resolution that would scale with the business was required to ensure their ecommerce reports, which are used all the way up to their CMO to make crucial investment decisions, reflected the truth of their performance.
There are essentially four main ways to get around hit caps for large transactions:
- Reduce the amount of data getting sent to GA;
- Break up the transaction into multiple hits but use the same transaction ID;
- Break up the transaction into multiple hits but use different transaction IDs;
- Reduce the amount of data getting sent to GA and use data import to send the missing data to GA after the fact.
Our client has very specific reports they create to analyze their data and depend on the data that is getting sent with each transaction to fuel these reports. The first option ‘reducing the amount of data getting sent to GA’ wasn’t going to fly, so we didn’t raise the option with them. We had however worked with them previously to develop some methods to avoid the 8kb data cap when it came to tracking their product impressions. Due to the work done there, our initial thought was to look at breaking up the transactions into multiple hits like we do for product impressions.
Failed Option: Break the Transaction into Multiple Hits
To break up the transaction into multiple hits we have two options available: We can use different transaction IDs for each hit, or we can use the same ID on all hits.
When using different IDs it make sense to take the actual transaction ID and append some type of counter to each hit. For example, if the transaction ID is 12345 and we were going to break that into three hits, those three hits would have the new IDs ‘12345-1’, ‘12345-2’, and ‘12345-3’. The big problem with this solution is that each hit is tracked as its own unique transaction. So in this case, this one transaction is counted as three transactions in GA. Multiply this by the number of transactions per month that a large organization with high sales can get and it’s going to get ugly quickly. This wasn’t going to work with our client as they are very concerned about conversion and this solution would break that, so we couldn’t recommend this solution either.
We spent some time looking at what could be done by breaking up the transaction using the same transaction ID for each new hit. There were some things to be aware of with this method as GA will not deduplicate the transaction, meaning that if we had a transaction revenue of $75, and that transaction was broken into three hits, the final total for that transaction ID in GA would be $225.00. Thankfully GA doesn’t do this with Transaction ID, but we had to get around this being done with Revenue for sure.
A quick experiment showed us that if you set one of your hits (in our case it was the first) with the correct revenue, tax, and shipping data, and set values for every other transaction to 0, you could get around this issue.
The bigger problem however was that, like using different transaction IDs, this still tracked in GA as multiple transactions (even though they all had the same Transaction ID) and broke our ability to properly report on conversions.
Failed Option: Data Import
That left us with looking at what we could do with data import. The transaction hits our client was sending were full of both hit level custom dimensions and metrics, as well as product-level custom dimensions and metrics. Our goal now was to see if we could reduce the amount of data being sent in the transaction hit and upload it back to GA after the fact.
The first hurdle there is that you can’t index your import on Transaction ID. As such, we couldn’t remove any of the hit-level dimensions or metrics; there was no way to upload that data to GA and associate it with the right transaction. That being said, the amount of product-level dimensions and metrics was also very high and we could likely get some good data savings by reducing those to the bare minimum. Pretty much all the data being tracked there we knew existed in their back office data, so we could remove it from the transaction hit and pull it out of the back office data to upload afterwards. To do this properly, we’d need to create a new product level custom dimension that we could use as our key for these uploads. Using the Transaction ID and appending a unique value to represent which product we were identifying made the most sense.
For the three products in transaction 12345, we could set up new product transaction IDs of 12345-1, 12345-2, and 12345-3. Our client is a Google Analytics 360 customer, so we could use the Query Time import to get this data back into GA and get this problem resolved seemingly with ease.
Unfortunately, it wasn’t that easy. Our client already uses data import and that, too, has a data cap. Query Time imports have a cap of 1GB and our client is getting close to this cap already. We knew that the sheer volume of transactions they log each month would blow this cap out of the water, and that was even after accounting for the possibility that we could set up the website so that only transactions we suspected would exceed the 8kb hit cap and let the rest go through untouched. That meant option four proved to not be acceptable either. We could not recommend any of the other potential solutions to our client, as they all had limitations that would ensure they wouldn’t meet the needs of our client.
Solution: BigQuery (and Data Studio)
It was clear that it was time to take our client to the next level of data processing – BigQuery!
The more we discussed it internally the more we became convinced that not only was this the right choice to solve our client’s transaction hit size issue, but BigQuery’s other benefits and services would be so beneficial to our client and their reporting needs that we couldn’t wait to draft up our formal recommendation.
Avoid Data Upload Caps
To address the problem at hand, BigQuery allows us to no longer have to worry about data upload caps. We can strip out all the data in the transaction hits that are already captured in the back office tool and upload it into BigQuery. Once in BigQuery, we can merge that data back with the data from the transaction hit without a worry. This will reduce the size of these transaction hits by 50%-66% at least.
Drive More Meaningful Analysis
Second, BigQuery allows for personally identifying information (PII). We know that our client’s back office data includes PII and we can now merge that data with the transactions for more meaningful analysis.
Be Relevant In-The-Moment
Thirdly, Google Analytics 360’s BigQuery export supports near live, intraday reporting. Not only do we now get those missing transactions back into their analytics database, but they’ll now have the ability to review any of their analytics data in near real time. This will be very powerful for them as they have a few key dates throughout the year where they want to be able to view and react as fast as possible to the data that is coming in on their website and ecommerce performance.
Bonus Points: Visualization in Data Studio
Now that we have a proper recommendation for ensuring that all future transaction data will get properly captured and tracked, as well as the other benefits offered by BigQuery, we wanted to ensure that we were also going to give them a solution that allowed them to easily build reports that their marketing staff could present to their directors. BigQuery on its own doesn’t produce pretty data for easy consumption. Our client is used to building custom reports in GA but we’re now recommending to remove a lot of the data they have come to expect to find in GA. In order to ensure our client can maintain their ability to create these valuable reports easily in a format suitable for presentation, we also recommended that they use Data Studio for their report building.
Data Studio will not only allow them to take the data from BigQuery and build user friendly and accessible reports, but Data Studio also connects with Google Analytics, AdWords, DoubleClick, YouTube, and Google Sheets. This will give them the ability to integrate data from multiple sources and easily create a single report.
As we were working through our options as to how to resolve these large transactions for our client, we were getting concerned that there wasn’t going to be a viable solution, at least not one that we felt comfortable recommending for the long term. Due to their specific reporting needs and volume of transactions, each solution available to us in the Google Analytics tool alone just didn’t work. By moving beyond the Google Analytics tool and employing more advanced and hearty products like Google BigQuery and Google Data Studio, we have been able to find a solution that not only resolves the immediate concern of missing transaction data, but gives our client even more tools at their disposal for analyzing their data and being able to make informed business decisions based on their online performance. These are truly tools that our client can grow into and can grow with our client.