Of 3.6M buyers, only 43,040 (1.2%) have an email and only 1,827 pass verification. The actionable outreach base is extremely thin.
Authoritative sources (customs DB etc.) = 0. The filled 32.6% is entirely AI derived (estimated) — see the box below for exactly what that means. In rfqs, hs_codes / product_category are 100% empty.
85% of enrichment email_verification is null (unverified). Of the 8.4k separate email_verifications: valid (ok) 4,541 / invalid·unknown 3,914.
hs_source = 'derived' does not mean the HS code came from any trade, invoice, or customs record.
It is produced by a two-step guess and only reaches 2-digit chapter granularity:
industry + category (e.g. beauty / skincare). Any misclassification propagates downstream.industry_taxonomy table maps that industry to an HS chapter — e.g. beauty→33, home_appliances→84,85, food_beverage→02–22. No product-level detail.Only 15.4% resolve to a single chapter. 41% span 5–16 chapters. food_beverage records all get the same 16-chapter blob (02–22) = "somewhere in food" — analytically worthless.
Only 181k (5.0%) of all buyers have a clean single-chapter HS — and even those are chapter-level, not a real 6–10 digit code.
2,322,715 buyers (64.5%) have industry = unknown / null, so no HS can be derived at all. The limiter is the upstream classifier, not the HS mapping.
100% of 6.73M stored values are 2 digits (61 distinct chapters). Customs / tariff / duty needs HS6–HSK10. Zero records qualify.
481,427 HS-filled records (41%) carry 5+ chapters; 276k food_beverage rows each carry 16. A 16-chapter "code" cannot discriminate products.
~758k classifications sit at confidence ≤0.35, yet HS is derived from them unconditionally — no confidence gate.
Of 257k rfqs, 52,493 already have an industry and 60,815 a beauty flag — yet hs_codes fill = 0. The derivation step simply never ran on the rfqs side.
The core vertical is clean — 134,438 beauty records map to the single chapter 33. Chapter-level HS is adequate for coarse beauty/non-beauty segmentation, but not for cross-industry matching or trade use.
Of 1.74M failures, most have error=null — ambiguous whether "failed" means a real error or a filtered-out verdict (schema issue).
850k high-score (80%+) matches exist, yet status: pending 943,348 / accepted 2 — output never consumed downstream.
Of 3.6M buyers, 1.2% (43k) have an email and 0.05% (1.8k) pass verification. Almost no data is actually usable for outreach.
Across all 234k rfqs, hs_codes / product_category fill = 0. The columns underpinning classification/matching are blank.
Of 940k buyer_supplier_matches, only 2 are accepted. 850k high-score matches sit in pending — the pipeline dead-ends.
classifier failed 1.39M, match_haiku failed 327k, mostly with no error message. The "failed" status is semantically ambiguous (filter vs error). Much of the $4,635 spend went to failed runs.
Zero authoritative-source codes. Not trustworthy for customs/tariff use. A verification layer is needed.
bizmaps_jp 42% vs usaspending 0.1% etc. Large per-source embedding gaps can bias vector search results.
enrichments (rfq side) has 1 row, identity_backup_20260603 and other backup/duplicate schemas remain. Cleanup candidates.
Concentrated in June bulk imports (bizmaps_jp, australia_asic_abr, etc.). No continuous collection pipeline is running.