Entity Resolution for KYB: The Complete Guide
How entity resolution powers accurate business verification—techniques, implementation strategies, and why it's the foundation of effective KYB.
Entity resolution is the technical foundation of effective Know Your Business (KYB) verification. It’s the process of determining when different records refer to the same real-world business—connecting “GTL Services LLC” in Delaware’s registry to “Green Thumb Landscaping” on a merchant application to “GREEN THUMB LANDSCAPE” in payment processor records.
Without entity resolution, KYB verification fails at scale. With sophisticated resolution, businesses sail through verification while fraudulent applications get flagged. This guide explains how entity resolution works, why it matters for KYB, and what separates basic matching from production-grade resolution.
The Problem: Fragmented Business Data
Business information exists across thousands of sources, each with its own format, naming conventions, and data quality:
State filing: “GTL Services LLC” Trade name filing: “Green Thumb Landscaping” Payment processor: “GREEN THUMB LANDSCAPE” Google Business Profile: “Green Thumb Landscaping & Lawn Care” Credit bureau: “GTL SERVICES”
These are all the same business. But how does a system know that?
Why Names Don’t Match
Legal names vs. trade names: Businesses register legally as “XYZ Holdings LLC” but operate publicly as “Joe’s Pizza”
Abbreviations: “Corporation” becomes “Corp” or “Co”; “Limited Liability Company” becomes “LLC” or “L.L.C.”
Stylization: “McDonald’s” vs “McDonalds” vs “MCDONALDS”
Typos and data entry errors: “Acme” becomes “Acmee” or “Acne”
Evolution: Business names change through rebranding, acquisition, or legal restructuring
Beyond Names
Name is just one attribute. Entity resolution must also handle:
Address variations:
- “123 Main Street, Suite 100” vs “123 Main St Ste 100” vs “123 Main St #100”
- Registered agent addresses vs. operating addresses vs. mailing addresses
- Businesses that relocate
Identifier inconsistencies:
- Not all sources include EIN or registration numbers
- Different identifier types across jurisdictions
- Missing or incorrect identifiers
Corporate structures:
- Parent-subsidiary relationships
- Franchises (same brand, different legal entities)
- DBAs that span multiple entities
Why Entity Resolution Matters for KYB
Verification Accuracy
The core KYB question is: “Is this business legitimate?” Answering requires matching the application to authoritative records.
Consider a business applying for a merchant account as “Green Thumb Landscaping” at “456 Main St, Columbus OH.” The Secretary of State record shows “GTL Services LLC” registered at “1209 Orange St, Wilmington DE.”
Without entity resolution: No match found → manual review or rejection
With entity resolution: Match found with high confidence (trade name filing links to legal entity, operating address differs from registered address as expected) → auto-approval
Entity resolution determines whether legitimate businesses pass verification or get stuck in manual review queues.
Straight-Through Processing
Straight-through processing (STP) rates measure how many applications resolve automatically without human intervention. Entity resolution directly impacts STP:
| Resolution Quality | Typical STP Rate |
|---|---|
| Exact match only | 30-40% |
| Basic fuzzy matching | 50-60% |
| Advanced multi-attribute | 70-80% |
| Graph-based with enrichment | 80-90% |
The difference between 40% and 80% STP is the difference between a sustainable operation and one buried in manual review backlogs.
Risk Detection
Entity resolution reveals patterns invisible to record-by-record analysis:
Shell company detection: Multiple businesses at the same registered agent address, sharing the same formation date and officer, despite claiming to be independent
Fraud rings: Applications with different business names but connected through shared addresses, phones, or beneficial owners
Sanctions evasion: An entity with a slightly misspelled name that would otherwise match a sanctioned party
Serial fraud: An individual appearing as the beneficial owner of multiple failed businesses
Beneficial Ownership Verification
Tracing ownership requires connecting entities through ownership chains:
Application: "Green Thumb Landscaping"
↓ resolution
Legal Entity: "GTL Services LLC" (Delaware)
↓ ownership lookup
Parent: "Smith Holdings LLC" (Wyoming)
↓ ownership lookup
Beneficial Owner: "Jane Smith" (person)
Without entity resolution, ownership verification stops at the legal entity name on the application.
Entity Resolution Techniques
Deterministic Matching
Match records using exact values of unique identifiers:
Identifiers used:
- EIN (Employer Identification Number)
- State registration number
- DUNS number
- LEI (Legal Entity Identifier)
Example:
Application EIN: 12-3456789
Registry EIN: 12-3456789
→ Exact match
Strengths:
- 100% precision (no false positives)
- Fast execution
- Simple implementation
Limitations:
- Requires identifier presence in both records
- Many records lack standardized identifiers
- Typos in identifiers cause false negatives
- Different identifier types don’t cross-match
Deterministic matching is the starting point but insufficient alone—it typically matches only 20-40% of records.
Probabilistic (Fuzzy) Matching
Compare multiple attributes using similarity algorithms and weighted scoring:
Name similarity algorithms:
- Edit distance (Levenshtein): How many character changes to transform one string to another
- Phonetic matching (Soundex, Metaphone): Match names that sound alike
- Token-based: Compare word sets regardless of order
- TF-IDF: Weight uncommon terms higher than common ones
Example:
Application name: "Green Thumb Landscaping LLC"
Registry name: "GTL Services LLC"
Trade name: "Green Thumb Landscaping"
Name similarity: Low (0.3)
Trade name similarity: High (0.95)
Address similarity: Medium (0.7)
→ Weighted score: 0.82 → Match
Address standardization:
- Parse addresses into components (street, city, state, zip)
- Standardize abbreviations (St → Street, Ste → Suite)
- Handle unit number variations
- Compare individual components
Weighted scoring:
Match score =
(name_sim × 0.35) +
(address_sim × 0.25) +
(city_state × 0.15) +
(identifier × 0.25)
Threshold tuning is critical: too low creates false positives, too high creates false negatives.
Machine Learning Approaches
Train models on labeled match/non-match pairs to learn complex patterns:
Supervised learning:
- Training data: Human-labeled pairs (match/non-match)
- Features: Similarity scores across multiple attributes
- Models: Random forests, gradient boosting, neural networks
- Output: Match probability
Benefits:
- Captures non-obvious patterns
- Adapts to specific data characteristics
- Can improve over time with feedback
Challenges:
- Requires labeled training data
- Model explainability for compliance
- Ongoing model maintenance
Graph-Based Resolution
Connect records through relationships, not just attribute similarity:
Relationship types:
- Shared address → linked records
- Same registered agent → linked records
- Common officers/directors → linked records
- Same phone number → linked records
- Ownership connections → linked records
Transitive connections:
Record A shares address with Record B
Record B shares officer with Record C
→ A may be related to C (transitive link)
Graph analysis:
- Connected components (all records that link together)
- Centrality measures (identify hub entities like formation agents)
- Community detection (clusters of related businesses)
Graph-based resolution excels at:
- Revealing corporate structures
- Detecting shell company networks
- Identifying formation agents and registered agent patterns
Building Entity Resolution for KYB
Architecture Components
1. Data Ingestion
- Connect to authoritative sources (Secretary of State APIs, business registries)
- Ingest application data
- Handle various formats and schemas
2. Normalization
- Standardize names (remove punctuation, normalize case)
- Parse and standardize addresses
- Clean and validate identifiers
- Extract structured data from unstructured fields
3. Blocking/Indexing
- Group records that might match (blocking keys)
- Avoid comparing every record to every other record
- Common blocks: first N characters of name, zip code, phonetic codes
4. Comparison
- Apply similarity algorithms to candidate pairs
- Calculate weighted match scores
- Capture comparison vectors for classification
5. Classification
- Determine match/non-match/maybe
- Apply thresholds or ML models
- Handle edge cases
6. Clustering
- Group matched records into clusters
- Handle transitive closure (if A=B and B=C, then A=C)
- Create unified entity records
7. API/Output
- Expose resolution as a service
- Return match results with confidence scores
- Provide audit trails for compliance
Tuning for KYB
False positive vs. false negative tradeoffs:
| Scenario | False Positive Risk | False Negative Risk |
|---|---|---|
| Auto-approve legitimate | ✓ Safe | ✗ Lost customer |
| Auto-approve fraud | ✗ Fraud loss | ✓ Caught in review |
| Reject legitimate | ✗ Lost customer | ✓ Safe |
| Reject fraud | ✓ Prevented | ✗ Fraud approved |
For KYB, false negatives (missing legitimate matches) are often more costly than false positives (flagging matches that need review). Tune accordingly, but monitor both.
Threshold calibration:
- Start conservative (higher threshold, more manual review)
- Analyze manual review outcomes
- Gradually adjust based on observed precision/recall
- Different thresholds for different risk tiers
Handling Edge Cases
New businesses: Recently formed entities may not appear in all data sources yet. Use formation documents plus initial signals.
Sole proprietors: May have no state filing at all. Match on individual identity plus business signals (trade name if registered, address, web presence).
Franchises: Same brand, different legal entities. Match to correct franchisee entity, not franchisor.
Name changes: Historical names may still appear in some sources. Maintain name history and match against all known names.
International entities: Different identifier types, character sets, and registry structures. Jurisdiction-aware resolution.
Measuring Resolution Quality
Precision and Recall
Precision: Of records the system says match, what percentage actually match?
Precision = True Positives / (True Positives + False Positives)
Recall: Of records that actually match, what percentage does the system find?
Recall = True Positives / (True Positives + False Negatives)
F1 Score: Harmonic mean balancing precision and recall
F1 = 2 × (Precision × Recall) / (Precision + Recall)
KYB-Specific Metrics
STP Rate: Percentage of applications resolved without manual review
Match Rate: Percentage of applications successfully matched to authoritative records
Review Yield: Percentage of manual reviews that result in different decisions than the automated suggestion
Time to Decision: How long from application submission to verification decision
Key Takeaways
- Entity resolution is the foundation of accurate business verification—without it, KYB fails
- Names don’t match across sources; resolution handles variations through multiple techniques
- Deterministic matching is precise but limited; probabilistic matching handles variation; graph-based resolution reveals structure
- Resolution quality directly impacts STP rates—better resolution means more automation
- Tune for your risk profile—balance false positives and false negatives based on cost
- Measure and iterate—track precision, recall, and business metrics to improve over time
Related Resources
- What is KYB? — Foundational overview
- Business Graph — Graph models for business relationships
- KYB Automation — Automation strategies
- Entity Resolution (glossary) — Quick reference
- The New Enigma KYB: More Automatic Approvals — How Enigma approaches entity resolution