Data Quality Is the Boring Problem That Kills AI Projects
Riverstone Team
Riverstone Labs

Riverstone Team
Riverstone Labs

If you have been in a room where an AI demo worked perfectly—and then watched the same idea wilt in production—you have probably seen the real villain. It is rarely “the model was two months old.” It is almost always the data underneath: incomplete CRM records, invoices that do not match any template, customer names spelled six ways, and spreadsheets that were “temporary” three years ago. Industry surveys of data and analytics leaders routinely put data quality and readiness near the top of the obstacle list for AI success (often ahead of talent and budget narratives). That matches what we see in Australian SMEs: the technology is willing; the sources of truth are not. This article is intentionally unglamorous. It is also where money gets saved. ## Why AI makes bad data more dangerous, not less Traditional software fails obviously: missing fields block submission, forms error out, reports show blanks. Many AI systems will fill gaps with plausible language—which is exactly what you do not want in operations. Automation does not just “do the task faster.” It amplifies whatever structure exists in your data. If your customer IDs are unreliable, you will route work incorrectly at scale. If your invoice formats are chaotic, extraction accuracy will swing week to week. ## The readiness work teams skip (and pay for later) You do not need perfection. You do need a short list of non-negotiables for the workflow you are automating: - Identity: a stable key for customers, suppliers, employees—whatever the process touches. - Deduplication: one real world entity should not be five records without a reason. - Consistency: the same concepts named the same way (products, stages, categories). - Completeness on critical fields: if a field drives routing or payment, it must be populated reliably going forward. Everything else can be staged. If you try to “fix the whole CRM” as a prerequisite, you will never ship anything. ## You probably do not need a data warehouse first Enterprise vendors love a warehouse narrative. For many mid-market automations, the right move is narrower: clean and standardise the two or three systems that feed the target process, then expand once value is proven. Examples: - Invoice automation needs reliable supplier records and a sane approvals path more than it needs a lakehouse. - Email triage needs consistent folder conventions and a labelled sample of historical messages more than it needs perfect org-wide taxonomy. ## A practical sequence that actually finishes 1. Audit the workflow’s inputs with someone who uses them daily—not only IT. 2. Fix the worst defects manually if volume allows. One-time cleanup is often cheaper than building automation to repair chaos you could sort in an afternoon. 3. Publish simple data entry standards (mandatory fields, naming rules, who owns changes). 4. Automate against the cleaned baseline. 5. Monitor drift as real life happens—suppliers change layouts, teams add new “helpful” spreadsheet columns. ## Timeline expectations (so you are not surprised) For workflows that touch customer, cash, or compliance data, it is reasonable to plan for a substantial share of the project timeline—often on the order of roughly a third to half—on discovery, cleansing, and validation work, depending on starting maturity. If a vendor promises you can skip that entirely, ask what they are not measuring. This is not wasted time. It is where you prevent silent wrong answers and rebuilds six weeks after go-live. ## The upside Data work is boring in the same way foundations are boring: nobody compliments you on rebar, but the building stands. Getting the inputs right is how automation becomes trustworthy enough to delegate—with the right oversight, of course. ## A concrete example: CRM hygiene before “AI sales” Imagine classifying inbound leads by intent and routing them to the right owner. The model is not magical if your CRM contains four records for the same person, stale stages, and notes that say “see email” with no link. In that world, automation routes confidently—to the wrong place. The fix is not a bigger model; it is dedupe rules, required fields at entry, and a short governance note on who can create new accounts. Once the CRM tells the truth often enough, classification becomes a solvable problem. That pattern repeats across finance documents, support tickets, and project tools: the workflow is only as sane as the object it manipulates. If you are unsure where to start, pick the system that feeds your highest-volume recurring decision. Cleaning that slice pays rent every week; cleaning a shelf nobody uses is procrastination dressed up as diligence. ## Next step If you are considering automation, start by asking what your system of record actually contains—not what the org chart says it contains. If you want an independent view of what will break first, Book a free 15-minute assessment with Riverstone Labs. Our Diagnose phase includes a straight-talk data audit tied to the workflows you care about.
Service capability:
Want this implemented in your business? Book a Diagnose call — free 30-minute consultation, no pitch.
Book a free 15-minute assessment. We'll look at your operations and identify the highest-ROI automation opportunities.
Book your free assessment