Skip to content

Ground Truth

Understand ground truth in business data—verified, authoritative information from primary sources rather than estimates or models.

4 min read

Ground truth is verified, authoritative data derived from primary sources rather than estimates, models, or aggregated signals. In business verification, ground truth comes from official registries, observed transactions, and validated operating data—not inferred or modeled attributes.

Ground Truth vs. Estimates

Much business data is estimated or modeled:

AttributeEstimateGround Truth
RevenueModeled from employee count and industryActual transaction data
Employee countInferred from office sizePayroll records
Operating statusAssumed from last filing dateObserved recent transactions
LocationRegistered addressVerified operating site

Estimates have their place, but high-stakes decisions require ground truth.

Why Ground Truth Matters

Verification Accuracy

Estimates can be wildly wrong:

  • A company might file in Delaware but have zero Delaware presence
  • Revenue models assume industry averages; actual businesses vary enormously
  • A business might be registered but never actually operated

Ground truth tells you what’s real.

Risk Assessment

Risk models built on estimates inherit their errors:

  • Overestimated revenue → underestimated risk
  • Assumed active status → missed business closures
  • Modeled employee count → wrong industry classification

Ground truth enables accurate risk scoring.

Regulatory Compliance

Regulators expect verified information:

  • KYB requires confirming business legitimacy
  • CDD requires understanding the customer
  • EDD requires source of funds verification

“We estimated they were legitimate” doesn’t satisfy examiners.

Sources of Ground Truth

Official Registries

  • Secretary of State filings (entity existence, officers, registered agent)
  • IRS records (EIN, tax status)
  • State licensing databases (professional licenses, permits)
  • Court records (liens, judgments, bankruptcies)

Transaction Data

  • Card transaction records (actual revenue, operating status)
  • Banking data (account activity, cash flow)
  • Payment processor records (processing volume)

Direct Verification

  • Site visits (physical presence)
  • Utility records (operational indicators)
  • Business correspondence (verified contact)

Third-Party Validation

  • Credit bureau business records
  • Industry-specific databases
  • Verified review platforms

The Ground Truth Hierarchy

Not all sources are equal:

TierSource TypeExample
1Government recordsSecretary of State, IRS
2Financial transactionsCard spend, bank records
3Licensed third partiesCredit bureaus, D&B
4Self-reported, verifiedApplications with document upload
5Self-reported, unverifiedForm submissions
6Modeled/estimatedRevenue models, inferred data

Higher tiers provide stronger ground truth.

Ground Truth in Practice

KYB Verification

Ground truth approach:

  1. Match application to Secretary of State record (Tier 1)
  2. Verify operating status via transaction data (Tier 2)
  3. Confirm ownership through registry (Tier 1)
  4. Validate address through multiple sources (Tier 1-3)

Estimate approach:

  1. Accept stated name and address
  2. Model revenue from industry
  3. Assume active if recently filed

When Estimates Are Acceptable

Ground truth isn’t always available or necessary:

  • Low-risk decisions may tolerate estimates
  • Some attributes (future growth) can only be projected
  • Cost/benefit may favor estimates for certain use cases

The key is knowing when you have ground truth and when you don’t.

Key Takeaways

  • Ground truth is verified data from primary, authoritative sources
  • Estimates and models are not ground truth—they’re approximations
  • High-stakes decisions require ground truth—verification, compliance, risk
  • Sources have different authority levels—government records > models
  • Know what you have—distinguish ground truth from estimates in your data

Related: Entity Verification | Data Enrichment | Operating Status