12 October 2023
Stop losing your GA4 data! Use BigQuery
- BigQuery can be used to store GA4 event level data, offering an alternative to data loss imposed by the GA4 front-end’s data retention settings
- Using BigQuery can help you troubleshoot and investigate common reporting issues as well as help validate implementations
- There are a number of ‘gotchas’ when linking GA4 to BigQuery including export settings and BigQuery table settings
What is BigQuery?
In our Dude where’s my data? GA4 data retention article, we mentioned the variety of data retention options available in GA4. However there is a way to store GA4 data beyond the limits imposed by the data retention settings in the product.
That’s where BigQuery comes in! So what exactly is it?
BigQuery is a flexible and powerful data warehousing tool that allows you to query large quantities of data in short periods of time. For example you can query terabytes within seconds and petabytes within minutes!
Sound good so far? BigQuery can be used in a variety of ways by data professionals and one way we will be touching on, is connecting GA4 to BigQuery to store data for as long as you like!
GA4 BigQuery connector
BigQuery allows you to export GA4 event level data to a Google Cloud BigQuery dataset (data warehouse). There are a multitude of reasons to allow GA4 to export data to BigQuery, however, we will just focus on the data retention benefits for now.
A huge benefit of having your GA4 dataset available in BigQuery is that you can store the data as long as you’d like with no forced data retention period applied.
This makes year on year (YoY) or even multi-year type data comparisons much easier to perform, with the added benefit of no data sampling being applied.
As great as this all sounds, unfortunately we’ve observed that many clients we’ve previously worked with had not yet linked their GA4 property to BigQuery.
This lack of immediate adoption may be for a few reasons noted below:
- Businesses may not yet understand the value proposition of GA4 BigQuery integration - which is one of the reasons we’ve put this article together.
- Google Analytics BigQuery integration under the Universal Analytics product was only available to 360 (paid) users. However, as of GA4, this is no longer the case and BigQuery integration is available to both paid (360) and non-paid GA4 property types.
- There are some modest costs associated with storing and processing data in BigQuery. Learn more about BigQuery pricing.
- Organisations need to be ok with storing this data in the Google Cloud platform and there may be some internal policy and legal team requirements to be discussed before this can go ahead.
Even if you’re not yet sure if you’ll use BigQuery (hint: we highly recommend everyone does!), or don’t yet fully understand the value proposition of it, we still recommend all our GA4 clients to enable BigQuery integration.
As Google Analytics certified partners, we constantly find use cases and reasons to refer to the BigQuery dataset, especially when troubleshooting or investigating client reporting issues and/or validating implementations.
For example: A common client troubleshooting request is to review data discrepancies between GA4 and other third party products. By enabling BigQuery exports, this helps us identify and spot issues more quickly - since we can export all the data we need from BigQuery without having to worry about cumbersome quirks that apply in some contexts when trying to extract large datasets via the front-end reporting UI, GA4 data API or other limitations imposed on exploration reports such as data sampling etc.
BigQuery integration “gotcha’s”
So hopefully we’ve piqued your interest in enabling GA4 and BigQuery integrations, however, there are a few quirks that you should be aware of so you can avoid disappointment down the line.
BigQuery export settings
If you enable exporting, you are presented with 2 checkbox options relating to export frequency.
- Daily - exports a full day of data at a time.
- Streaming - real time exports data as it arrives.
Each export type creates a distinct table name.
The streaming exports will be prefixed, ‘events_intraday_YYYYMMDD’, and these are automatically removed from BigQuery periodically, meaning they have a short retention period. This export setting is usually used if you want to look at the most up to date data freshness eg: monitor a Black Friday sales event as it’s happening.
The daily exported tables, which have a prefix, ‘events_YYYYMMDD’, will remain in BigQuery and are not removed periodically* (see BigQuery Table Retention heading below).
The common misconfiguration problem we’ve seen clients make is best described by the screenshot below - where only the streaming export checkbox is enabled and not the daily export checkbox with it.
In many instances we’ve seen clients only have 1 enabled (streaming) and therefore their historical daily data is not being retained in BigQuery.
By only enabling streaming, your GA4 data won’t remain in BigQuery and you’ll only be able to query recent real time table data.
BigQuery exports are sadly not retrospective
Enabling daily exports is similar to the data retention settings in the GA4 UI as they are both not applied retrospectively and only apply from the date of the change/enablement moving forward. Meaning BigQuery exports only begin from the date of linking/enablement. This is why it’s critical to enable BigQuery integration ASAP, even if you’re not yet completely aware of all the great use cases of this integration.
I liken having this dataset available to a life vest or parachute, it’s a case of better to have it and not need it, than need it and not have it.
For Universal Analytics 360 customers, BigQuery exports did offer a one-off backfill option (with quite a few caveats attached), but it’s not yet clear if GA4 will also support retrospective backfills of GA4 data into BigQuery in the near future. Perhaps if Google does decide to make this available, they may only give the option to paid (360) users.
So right now, Louder’s immediate advice is to enable exports as soon as possible. This will give you the maximum amount of historical GA4 data possible now that Universal Analytics has stopped processing data for the majority of non 360 users.
BigQuery table retention*
Have you spotted a theme yet? Data retention in GA4 has a lot of quirks, so we won’t be surprised if many users fail to get this right.
Unfortunately as you can imagine, we’ve discovered some of these the hard way when working with our own clients. So here’s 1 final gotcha to be aware of when working with data retention in GA4 and BigQuery. This time the issue however is on the Google Cloud side, not so much a GA4 product issue itself.
When linking GA4 and BigQuery with a Google Cloud project that does not yet have an associated Cloud Billing Account linked to it, the BigQuery table setting will default to a maximum retention period of ONLY 60 days.
When you haven’t yet linked to a Google Cloud Billing Account, the default table expiration is set to a hard 60 day limit.
If you try to adjust this setting without being linked to a Cloud Billing Account, you’ll see the following error message.
Here is where things get a bit tricky.
If you link or create a new Cloud Billing Account, you’ll need to remember to proactively go back to your table settings and increase your table expiry to ‘never’ or cap it at ideally a much higher internal (than 60 days) that suits your business. Simply adding a billing account won’t change this setting for you, so be extra vigilant when checking this configuration to ensure you’re getting data retention beyond 60 days.
Don’t forget if you need help with data retention, understanding the value proposition of having GA4 data in BigQuery data, or even pushing your GA4 data to your own non Google data warehouse, Louder is here to help. Get in touch!