

Why Manual Data Checks are Essential for Accurate CLTV Predictions
Learn how to improve Customer Lifetime Value (CLTV) predictions with automated data checks in CrystalValue, plus why human review is still essential

Shreya Banker
Data Scientist
Data Analyst enthusiast. More than 7 years of exposure in Data Analysis and Software programming. I am a highly motivated, versatile IT professional with experience in Data Analysis, Visualization and Database Management. I look for the hardest problem to solve and where I can learn and develop the most. I love a challenge and never run from a difficult task. I'm determined to succeed, and I look forward to what life has to offer.
When it comes to predicting Customer Lifetime Value (CLTV), one thing is clear—your model is only as good as your data. It doesn’t matter how advanced your machine learning algorithms are; if the data feeding into them isn’t clean and accurate, your predictions won’t be either.
That’s where automated data checks come in. They help clean up datasets quickly, flagging issues like missing values, duplicates, and outliers. But here’s the catch—automation isn’t perfect. Some mistakes require human judgement to catch, and relying only on automated checks can lead to misleading insights.
Recently, I worked with CrystalValue, a tool built by Google Marketing Solutions that runs automated data quality checks to help businesses clean and prepare their data before running predictive models. It’s part of Google Cloud’s Vertex AI and can be found here.
CrystalValue is a fantastic tool for automating data validation, but through real-world experience, I’ve learned that manual checks are still essential. Let’s dive into why automation alone isn’t enough and how to strike the right balance.
What CrystalValue’s Automated Checks Do Well
CrystalValue is designed to catch common data problems that can mess up CLTV predictions. It automatically flags:
Missing Values: Automatically detecting and filling missing values in key columns like Customer ID.
Duplicate Rows: Identifying and removing duplicate transactions that could skew model results.
Outliers: Flagging extreme values in columns such as Price and Quantity that don’t fit within expected ranges.
Data Type Inconsistencies: Ensuring that the correct data types (e.g., categorical vs. numerical) are applied for each column.
The best part? Running these checks is super simple. Just a quick Python command and it scans the dataset:
TABLE_NAME = 'online_retail_data'
pipeline.load_dataframe_to_bigquery(
data=data,
bigquery_table_name=TABLE_NAME)
summary_statistics = pipeline.run_data_checks(transaction_table_name=TABLE_NAME)
For basic data cleaning, these automated checks save time and effort. But do they catch everything? Not quite.
Why Automated Data Checks Alone Aren’t Enough
While tools like CrystalValue are great at catching obvious issues, they don’t understand the business context. Here’s where things can go wrong if you rely only on automation.
1. Automation Can’t Catch Context-Specific Mistakes
A tool like CrystalValue doesn’t know your business. It just looks at numbers and patterns—it doesn’t think critically like a human does.
Example 1: Incorrect Product Codes
Imagine a product is accidentally assigned the wrong StockCode in your dataset. CrystalValue won’t flag it—it has no way of knowing if that code is right or wrong. A manual review would catch the mismatch.
Example 2: Discounts & Promotions Misclassified as Outliers
Say your store runs a huge Black Friday sale, and products are sold at 50% off. CrystalValue might flag these transactions as "outliers" because the prices look too low compared to normal. But these sales are valid transactions—removing them could completely distort your CLTV predictions.
2. Business Rules Are Unique—Automation Won’t Catch Everything
Every company has its own way of recording transactions, refunds, and customer behavior. Automated checks don’t account for business-specific logic unless they’re customized.
Example: Refunds & Returns
If refund transactions aren’t properly linked to the original purchases, CrystalValue might not catch it. That means your revenue numbers could be inflated—leading to overestimated CLTV predictions.
Example: Duplicate Customers
CrystalValue can spot duplicate transactions, but what about customers with multiple accounts? If a single person makes purchases using two different email addresses, automation won’t flag them as duplicates. A manual check comparing emails, phone numbers, or names might be needed.
3. Some Outliers Are Normal
CrystalValue flags extreme values, but sometimes, outliers aren’t mistakes—they’re just part of how your business operates.
Example: Seasonal Sales Spikes
Retailers always see big jumps in sales during Christmas, Black Friday, or back-to-school season. An automated tool might incorrectly mark these as unusual, but a human would know this is normal.
Example: High-Value Customers Who Buy in Bulk
Some customers regularly make big purchases—they aren’t fraudsters or outliers, they’re just loyal VIP buyers. An algorithm might flag their behavior as unusual, but a human analyst can see the bigger picture.
4. Formatting Issues Still Need Manual Fixing
Automated tools check for data types, but they don’t always catch formatting inconsistencies.
Example: Inconsistent Date Formats
Some rows might have dates in MM/DD/YYYY format, while others use YYYY-MM-DD. CrystalValue might not notice this issue, but it can cause errors in time-based CLTV models.
Example: Messy Text Fields
Columns with product descriptions or customer feedback often contain typos, abbreviations, or extra spaces that automation won’t fix. These still require manual cleaning.
How to Combine Automation & Manual Review for Better Results
To get the most accurate CLTV predictions, use automation where it makes sense, but don’t skip manual checks. Here’s a quick checklist:
Cross-check flagged outliers against promotional campaign data.
Verify refund transactions manually against customer IDs.
Look for duplicate customers across multiple IDs (merge using email/name).
Ensure seasonal trends are not incorrectly flagged as anomalies.
Check for inconsistent date formats, currencies, or other data discrepancies.
Final Thoughts
Automated tools like CrystalValue make data cleaning faster and easier, but they don’t replace human judgment. If you rely only on automation, you risk removing valid data, misclassifying transactions, or missing business-specific errors.
Best approach? Use automation for efficiency, but combine it with manual review to ensure your CLTV predictions make sense.
More Insights


Why Manual Data Checks are Essential for Accurate CLTV Predictions

Shreya Banker
Data Scientist
Aug 20, 2025
Read More


How can GA4 Data be supercharged with Generative AI in BigQuery?

Shreya Banker
Data Scientist
Aug 13, 2025
Read More


How Google’s Machine Learning Tools Turn Data Into Predictive Power

Lisa Ying
Manager, Data Solutions
Aug 6, 2025
Read More
More Insights
Sign Up For Our Newsletter

Napkyn Inc.
204-78 George Street, Ottawa, Ontario, K1N 5W1, Canada
Napkyn US
6 East 32nd Street, 9th Floor, New York, NY 10016, USA
212-247-0800 | info@napkyn.com

Why Manual Data Checks are Essential for Accurate CLTV Predictions
Learn how to improve Customer Lifetime Value (CLTV) predictions with automated data checks in CrystalValue, plus why human review is still essential

Shreya Banker
Data Scientist
August 20, 2025
Data Analyst enthusiast. More than 7 years of exposure in Data Analysis and Software programming. I am a highly motivated, versatile IT professional with experience in Data Analysis, Visualization and Database Management. I look for the hardest problem to solve and where I can learn and develop the most. I love a challenge and never run from a difficult task. I'm determined to succeed, and I look forward to what life has to offer.
When it comes to predicting Customer Lifetime Value (CLTV), one thing is clear—your model is only as good as your data. It doesn’t matter how advanced your machine learning algorithms are; if the data feeding into them isn’t clean and accurate, your predictions won’t be either.
That’s where automated data checks come in. They help clean up datasets quickly, flagging issues like missing values, duplicates, and outliers. But here’s the catch—automation isn’t perfect. Some mistakes require human judgement to catch, and relying only on automated checks can lead to misleading insights.
Recently, I worked with CrystalValue, a tool built by Google Marketing Solutions that runs automated data quality checks to help businesses clean and prepare their data before running predictive models. It’s part of Google Cloud’s Vertex AI and can be found here.
CrystalValue is a fantastic tool for automating data validation, but through real-world experience, I’ve learned that manual checks are still essential. Let’s dive into why automation alone isn’t enough and how to strike the right balance.
What CrystalValue’s Automated Checks Do Well
CrystalValue is designed to catch common data problems that can mess up CLTV predictions. It automatically flags:
Missing Values: Automatically detecting and filling missing values in key columns like Customer ID.
Duplicate Rows: Identifying and removing duplicate transactions that could skew model results.
Outliers: Flagging extreme values in columns such as Price and Quantity that don’t fit within expected ranges.
Data Type Inconsistencies: Ensuring that the correct data types (e.g., categorical vs. numerical) are applied for each column.
The best part? Running these checks is super simple. Just a quick Python command and it scans the dataset:
TABLE_NAME = 'online_retail_data'
pipeline.load_dataframe_to_bigquery(
data=data,
bigquery_table_name=TABLE_NAME)
summary_statistics = pipeline.run_data_checks(transaction_table_name=TABLE_NAME)
For basic data cleaning, these automated checks save time and effort. But do they catch everything? Not quite.
Why Automated Data Checks Alone Aren’t Enough
While tools like CrystalValue are great at catching obvious issues, they don’t understand the business context. Here’s where things can go wrong if you rely only on automation.
1. Automation Can’t Catch Context-Specific Mistakes
A tool like CrystalValue doesn’t know your business. It just looks at numbers and patterns—it doesn’t think critically like a human does.
Example 1: Incorrect Product Codes
Imagine a product is accidentally assigned the wrong StockCode in your dataset. CrystalValue won’t flag it—it has no way of knowing if that code is right or wrong. A manual review would catch the mismatch.
Example 2: Discounts & Promotions Misclassified as Outliers
Say your store runs a huge Black Friday sale, and products are sold at 50% off. CrystalValue might flag these transactions as "outliers" because the prices look too low compared to normal. But these sales are valid transactions—removing them could completely distort your CLTV predictions.
2. Business Rules Are Unique—Automation Won’t Catch Everything
Every company has its own way of recording transactions, refunds, and customer behavior. Automated checks don’t account for business-specific logic unless they’re customized.
Example: Refunds & Returns
If refund transactions aren’t properly linked to the original purchases, CrystalValue might not catch it. That means your revenue numbers could be inflated—leading to overestimated CLTV predictions.
Example: Duplicate Customers
CrystalValue can spot duplicate transactions, but what about customers with multiple accounts? If a single person makes purchases using two different email addresses, automation won’t flag them as duplicates. A manual check comparing emails, phone numbers, or names might be needed.
3. Some Outliers Are Normal
CrystalValue flags extreme values, but sometimes, outliers aren’t mistakes—they’re just part of how your business operates.
Example: Seasonal Sales Spikes
Retailers always see big jumps in sales during Christmas, Black Friday, or back-to-school season. An automated tool might incorrectly mark these as unusual, but a human would know this is normal.
Example: High-Value Customers Who Buy in Bulk
Some customers regularly make big purchases—they aren’t fraudsters or outliers, they’re just loyal VIP buyers. An algorithm might flag their behavior as unusual, but a human analyst can see the bigger picture.
4. Formatting Issues Still Need Manual Fixing
Automated tools check for data types, but they don’t always catch formatting inconsistencies.
Example: Inconsistent Date Formats
Some rows might have dates in MM/DD/YYYY format, while others use YYYY-MM-DD. CrystalValue might not notice this issue, but it can cause errors in time-based CLTV models.
Example: Messy Text Fields
Columns with product descriptions or customer feedback often contain typos, abbreviations, or extra spaces that automation won’t fix. These still require manual cleaning.
How to Combine Automation & Manual Review for Better Results
To get the most accurate CLTV predictions, use automation where it makes sense, but don’t skip manual checks. Here’s a quick checklist:
Cross-check flagged outliers against promotional campaign data.
Verify refund transactions manually against customer IDs.
Look for duplicate customers across multiple IDs (merge using email/name).
Ensure seasonal trends are not incorrectly flagged as anomalies.
Check for inconsistent date formats, currencies, or other data discrepancies.
Final Thoughts
Automated tools like CrystalValue make data cleaning faster and easier, but they don’t replace human judgment. If you rely only on automation, you risk removing valid data, misclassifying transactions, or missing business-specific errors.
Best approach? Use automation for efficiency, but combine it with manual review to ensure your CLTV predictions make sense.
More Insights

Why Manual Data Checks are Essential for Accurate CLTV Predictions

Shreya Banker
Data Scientist
Aug 20, 2025
Read More

How can GA4 Data be supercharged with Generative AI in BigQuery?

Shreya Banker
Data Scientist
Aug 13, 2025
Read More

How Google’s Machine Learning Tools Turn Data Into Predictive Power

Lisa Ying
Manager, Data Solutions
Aug 6, 2025
Read More
More Insights
Sign Up For Our Newsletter
