How To Normalize Marketing Analytics Data To Align With Business Data Sets

by Katerina Naddaf

By Katerina Naddaf and Ed Finn

In my three years working in the analytics space, a common theme I see among our clients is the divide between Finance, IT, UX, and Marketing teams on the use of web data for decision making. Everyone trusts their own numbers, above all others.

So how do we align these teams and get their buy-in to leverage analytics tools as a source of truth for decision making?

You guessed it … Normalization.

What is Normalization

The term “Normalization” stems from Normalization Process Theory (NPT). This sociological theory, originating in healthcare, works to explain how new technologies, ways of behaving, and ways of working become routinely integrated and embedded in everyday practice. It also strives to develop understanding around why some processes become normalized while others do not.

However, for the analytics world, Napkyn adapts elements of the NPT to assess the accuracy of primary KPIs within an analytics tool against reliable and consistent secondary datasets. This paradigm shift moves a functional theory about How different Agents in large social structures slowly make new technology part of everyday routine to How can we ensure our clients see the accuracy of analytics to enable an organizational shift to using this tool every day

Most often digital data normalization  is performed on topline eCommerce data but it can also be performed on any KPI important to the organization (e.g. lead form submissions).

This is Not About QA … 

While Quality Assurance (QA) is meant to ensure that a measurement system functions appropriately based on test cases, it does not review accuracy against a large set of transactions, leads, or your primary KPIs.

Assumptions of Digital Data Normalization

  • The biases between two datasets, when controlled for as much as possible, should present a difference in total volume of no more than 10% of the previously existing data source. This is an industry standard, though Napkyn’s standard is significantly lower. Your organization should develop its own expectations on what should be considered an acceptable level of difference between data sources after biases have been accounted for.
  • Knowledge of the previously existing data set is available. The dataset’s definitions, biases, and measurement methods are well known.
  • The previously existing dataset has measured the relevant metrics consistently. The dataset cannot measure things inconsistently. For example, if your dataset cannot track mobile phone users, that’s ok because the bias is consistent, however if the dataset has had portions deleted at the CRM team’s discretion without documentation or review of reasoning, then normalization should not be done.

Steps To Normalize Digital Data 

  1. Identify the metric(s) for normalization. There should be 1-2 related metrics maximum and they should be the most important metrics for performance to the organization.
  2. Ask the internal dataset owner(s) whether they know of any existing biases between the internal dataset and analytics tool. For example: This dataset includes orders from the call center, which the analytics tool does not include.
  3. Ask the internal dataset owner(s) to send several days worth of data adjusting for the dataset biases noted in #2. Length of time and size of dataset should be sufficient so that there are at least a few hundred instances of the KPI being recorded.
  4. Compare the topline volume of the KPI recorded over the time period in the internal data vs. the analytics tool.
  5. If the difference between the two datasets is within the acceptable range, you are done and ready for the next dataset (be sure to document the details on project closure).
  6. If the difference between the two datasets is outside the acceptable range, analyze the data within the analytics tool to help identify correlations and potential causes. For example: This dataset is missing mostly transactions from the early morning, or This dataset is missing leads from Apple branded devicesProceed to step 5.
  7. Supply the results to your stakeholder, solicit any assistance in identifying the cause of the difference between the datasets. Rinse & repeat the steps until completion of all important KPIs. 

Common Reasons Why Analytics Data Deviates From Other Data Sets

The hardest part of this process is determining the cause of deviation between datasets. Through creative analysis across numerous client accounts, we’ve discovered:

  • Under performing servers by reviewing both datasets controlling for region
  • Exclude filters applied to analytics reporting views were filtering out Transaction IDs
  • Time Zone differences in tools
  • Google Analytics javascript was taking too long to load on order confirmation
  • A massive number of paid clicks from IP addresses listed by the Interactive Advertising Bureau as ‘Known bots and spiders”

No amount of blog reading will teach you how to be a creative data analyst, so make sure you find your best creative thinker to help with this work.

Now you have the steps & thinking to prove that two datasets are not aligned 1:1, but are aligned enough for practical purposes. To build off of your Normalization efforts, as a next step, focus on training on the tools being normalized and proving these tools can facilitate the needs of other departments. For help with that or help with digital Normalization give us a call.

Katerina Naddaf


As an Analyst at Napkyn Analytics, Katerina works with our largest customers to help them understand the performance of their marketing efforts and ecommerce stores.

See more posts from Katerina