95% of AI Projects Fail. Here's What the 5% Do Differently.
Riverstone Team
Riverstone Labs

Riverstone Team
Riverstone Labs

If you read the business press on generative AI, you get two stories at once. Spending is up. Pilots are everywhere. At the same time, a growing stack of research and survey reporting says most of that activity is not showing up where owners care: gross margin, cycle time, error rates, and customer outcomes.
MIT’s work on what some outlets call the “GenAI divide” has been summarised bluntly: a large majority of enterprise generative AI initiatives are not producing measurable profit-and-loss impact, while a small subset is. Separately, market research in 2025 has pointed to many organisations scaling back or shelving initiatives that looked promising in slides but did not survive contact with real operations. (Always check the primary source before you quote a headline number in your own materials—the point here is directional: lots of motion, limited durable value.)
None of that means the underlying technology is a hoax. It means the failure mode is familiar. It is the same failure mode that showed up in earlier waves of “digital transformation”: buying capability without fixing who owns the workflow, what “good” means in numbers, and what happens on day thirty after the consultants leave.
When AI is bought like a software licence—rolled out wide, anchored to a generic use case, and measured in “adoption” rather than outcomes—results disappoint. When it is treated as operational infrastructure, the conversation changes. You start with a process that already costs real hours or real dollars, you define the metric you will move, and you accept that data quality and integration are part of the budget, not a surprise line item.
Research that compares delivery approaches has been reported as showing higher success rates for vendor-led implementations than for purely internal builds in some enterprise samples. That is easy to misread as “hire a vendor and you win.” The more useful reading is narrower: structured delivery, clear scope, and workflow focus correlate with getting to production. A disciplined internal team can do the same. A chaotic vendor engagement can fail like any other IT project. The differentiator is operational discipline, not the logo on the invoice.
Across case studies and research summaries, three patterns show up again and again in initiatives that actually stick.
First, they anchor on a specific business problem and a measurable definition of success. Not “we want AI,” but “we will cut invoice processing time from twelve minutes to three on average,” or “we will route tier-one enquiries without a human for cases above a confidence threshold, and measure escalation quality weekly.” If you cannot state the metric, you are not ready to spend.
Second, they invest in data readiness before they obsess over model choice. Messy CRM entries, inconsistent invoice formats, and duplicate customer records do not become less messy because a larger language model is plugged in. They become more expensive to fix at runtime. The boring work—standardisation, deduplication, clear ownership of the system of record—is where many projects either pay their dues early or fail late.
Third, they design human oversight in from day one for decisions that affect customers, cash flow, and risk. Automation that touches payments, commitments to customers, or compliance-sensitive workflows needs explicit checkpoints: who reviews, when review is mandatory, and what gets logged. Oversight added after a public mistake is expensive in every sense.
Most damage happens between the polished demo and the first month of live volume. Demos use curated examples. Production hits ambiguous subject lines, handwritten PDFs, edge cases in your industry, and the fact that your stack is Xero plus a CRM plus email plus a spreadsheet nobody admits to. Integrations break when upstream APIs change. Accuracy drifts when seasonality or product lines shift.
“Pilot purgatory” is what happens when nobody has the authority to either kill the pilot or promote it to a owned production system—with monitoring, runbooks, and a named internal owner. Pilots that linger without a decision burn credibility with staff who are asked to humour another tool.
For Australian SMEs in particular, the constraint is rarely “access to models.” It is time: who will own the workflow after launch, how interruptions will be handled when something misclassifies an email or a bill, and whether the savings show up in the roster or only in a slide. Projects that respect those constraints tend to stay small enough to finish and large enough to matter.
You do not need a research lab. You need a production mindset: one workflow, one metric, clear data inputs, and documented human checkpoints where stakes warrant them. If an initiative cannot explain those four things, it is not ready for meaningful spend—whatever the vendor’s quarter-end discount looks like.
If you want that discipline applied to your operations with senior-led delivery and ROI discussed up front, book a free assessment and we will map where automation is likely to earn its keep—and where it is not.
Service capability:
Want this implemented in your business? Book a Diagnose call — free 30-minute consultation, no pitch.
Book a free 15-minute assessment. We'll look at your operations and identify the highest-ROI automation opportunities.
Book your free assessment