AI that does something useful — not a chatbot in a top hat.
You don't need another ChatGPT subscription. You need AI that handles the specific stuff eating your week — drafting quotes, answering the same customer question for the 40th time today, summarizing a client's history before you pick up the phone. Built into the tools you already use, grounded in your data, priced so the math actually works. From $500.
Is AI actually worth it for your business?
Real talk, up front.
AI probably earns its keep if…
- You're doing the same task over and over — drafting quotes, replying to the same question, sorting invoices — and it's eating hours every week.
- You have a pile of data (orders, tickets, emails, PDFs, call notes) that nobody has time to read, but there are answers buried in it.
- Your team fields the same 10 questions from customers every day, and your website could answer them if it knew how.
- You want to take on more work without hiring another person to handle the admin that comes with it.
- You've played with ChatGPT yourself and thought "there's got to be a way to wire this into my actual business, not a separate browser tab."
Probably hold off if…
- The real problem is you don't have the data yet. AI grounded in nothing is a hallucination machine — fix the data first, then we talk.
- You want "an AI" but you can't point to a specific task it would do. Come back with one real thing on your plate and we'll build one good answer to it.
- Your team already has the process solved in a spreadsheet and it works fine. Save your money — I'll tell you on the call.
- You're chasing AI because it's trendy. Customers don't care what's under the hood — they care whether you pick up the phone.
If you're not sure which side you land on, that's exactly what the free call is for.
What every AI feature I build gets by default.
Not upsells. How I keep AI honest.
- Grounded in your actual data
- Cost tracked from day one
- Answers in under 3 seconds
- You can see what it's doing
- Backup plan when it breaks
- Human approval for the risky stuff
- You own the prompts and data
How I actually build AI features.
Picked per task. Embedded where the work already happens. Measured before it earns a promotion.
Right tool for the task
Claude for the thinking-heavy stuff, OpenAI for fast classification and extraction, smaller cheaper models for the high-volume grunt work. Mixed and matched, not married to one vendor — so when a better model drops next month, you upgrade with a config change, not a rewrite.
It reads your stuff before it answers
The AI pulls from your docs, PDFs, orders, past customer emails — whatever's relevant — and cites its sources. That's most of the difference between "useful" and "embarrassing." No more made-up prices, made-up policies, or made-up last names.
Tested before it ships
Before anything goes live, I run it against 30–100 real examples from your actual business and score it. Accuracy, cost-per-answer, how often it refuses. If the numbers don't clear the bar, we tune it or call it — you don't pay for production code that fails the test.
Embedded where you already work
AI shows up inside your website, your app, your email drafts, your Power Automate flow — not as another tab to remember. The best AI feature is the one nobody has to log into.
I ship AI every day — mine and clients'.
I write production code with Claude daily, run multi-step AI workflows inside my own Rec Soccer app, and replaced about $40K/year of vendor reporting at a previous role with AI-assisted automation. I'm not selling something I read about — I'm selling what already runs on my own machines.
The AI models powering your build.
I'm not married to one vendor. I pick the model that wins on your task — accuracy, cost, and speed — and swap it out when a better one drops. Usage is billed to your account, not mine, so you never pay a token markup.
Claude
AnthropicMy default for reasoning, long-context work, careful writing, and tool use. Claude Sonnet for most tasks, Opus for the hard ones.
GPT
OpenAICheap, fast classification and structured extraction. My go-to when you need "read this email and pull out the invoice number" at scale.
Gemini
GoogleHuge context windows and strong multimodal — reading full PDFs, long meeting transcripts, or whole codebases in one shot. Plays nicely with Google Workspace data.
Llama
MetaOpen-weight models for when you need to run it yourself — on your own servers, in regulated industries, or to keep data off a third-party API entirely.
ElevenLabs
VoiceThe best voice I've heard — cloned voices, multilingual, conversational agents. Where AI needs to sound like a person, this is it.
Sora + Nano Banana
OpenAI · GoogleVideo from Sora, images from Nano Banana (Google's Gemini 2.5 Flash Image). For marketing assets, demo reels, ad creative, product mockups, and social posts — when you don't have a camera crew or a designer in the budget.
sora.com deepmind.googlePlus specialty services when they earn it — Whisper and Deepgram for speech-to-text, Azure OpenAI or Amazon Bedrock for regulated or enterprise contracts, Hugging Face for the long tail of open models. Whatever wins on your task.
Three ways to start. Pick the shape that fits.
Every package is grounded in your data, evaluated before launch, and yours to own. The ranges are typical — I send a fixed, one-page quote after a discovery call.
Spark
- Timeline
- 1–2 weeks
- Scope
- Single feature
- Languages
- English
- Discovery on the actual task — what's in, what's good, what's broken
- Model + prompt picked and tuned against 30+ real examples
- Structured output (JSON schema) so downstream tools can use it
- Wired into your existing inbox, CRM, sheet, or workflow
Stack
- Timeline
- 3–5 weeks
- Scope
- Multi-step workflow
- Languages
- English + 1
- Everything in Spark
- RAG pipeline over your docs, PDFs, sheets, Notion, or Slack export
- Vector store + chunking strategy tuned for your content
- Source citations on every answer — no "trust me" responses
Suite
- Timeline
- 6–12 weeks
- Scope
- Multi-feature, app-embedded
- Languages
- 2–12 languages
- Everything in Stack
- Tool use — AI calling your APIs, CRM, calendar, ERP, or custom endpoints
- Multi-step agent loops with retries, fallbacks, and human approval gates
- Multimodal where it earns its keep — voice (STT/TTS), images, PDFs
Extras, when they earn it.
Slot these onto any package, or add them later as the use-case grows.
RAG corpus / vector store
$800–$2.5KIngest, chunk, embed, and index your docs — plus a re-ingest job for when content changes. Firestore vector, Pinecone, or pgvector.
Voice (STT + TTS)
$1K–$3KWhisper or Deepgram for speech-to-text, ElevenLabs or OpenAI voices for replies. Phone, browser, or in-app.
Vision / image understanding
$800–$2KRead receipts, IDs, forms, screenshots, product photos. Extract structured data or answer questions about what's in the picture.
Eval harness
$600–$1.5KA test set, scoring rubric, and a one-command runner so you can see the impact of every prompt or model change before shipping it.
Prompt-versioning UI
$1K–$2.5KA small admin panel where you can edit prompts, A/B test variants, and roll back — without redeploying the app.
Monthly AI retainer
$300–$1K / moPrompt tuning, cost monitoring, model upgrades when new ones drop, and "why did it do that" investigations — on call.
How an AI project actually goes.
No magic-wand demos. A call, a scoped pilot, real measurement, then production — or an honest stop.
20-minute discovery call
I ask what task you're trying to fix, who does it today, what "good" looks like, and where the data lives. If AI isn't the right tool, I'll say so — sometimes a Power Automate flow or a SQL view is the answer.
Scoped pilot
Within 48 hours you get a fixed quote — one feature, one model, one success metric. We build against 30–100 real examples from your data, not made-up ones.
Measure with evals
Before anything goes live we score it. Accuracy, cost-per-call, p95 latency, refusal rate. If the numbers don't clear the bar we tune, swap models, or call it — you don't pay for production code that fails the test.
Ship and iterate
Live in your app, monitored in real time, with logs you can see. Most clients keep me on a small retainer to tune prompts and ride model upgrades; some don't. Both are fine.
Things people usually ask.
Less than the demos you've seen, but never zero — that's why every build is grounded in your data with retrieval, validated with structured outputs, and measured against real examples before it ships. For high-stakes tasks I add a human approval step. The honest answer is: AI gets things wrong, and the engineering is in catching it before the user does.
Yours, billed directly to your OpenAI / Anthropic account. I don't resell tokens or take a markup on usage. You see exactly what every call costs, and if we ever part ways the keys (and the spend) stay with you.
For a Spark feature, often $5–$50/mo in API calls. For a Stack assistant with RAG, typically $30–$300/mo depending on traffic. With prompt caching turned on, repeat queries can drop 50–80% cheaper. I show you the dashboard from day one so there are no surprises.
OpenAI and Anthropic both contractually don't train on API traffic by default. Your data lives in your accounts (Firestore, Pinecone, S3 — whatever you use), the AI calls happen server-side from your infrastructure, and nothing gets logged to a third-party SaaS unless you ask for it. For regulated work I can also use Azure OpenAI or Bedrock for additional contractual coverage.
All of them, picked per task. Claude (Sonnet/Opus) is my default for reasoning, careful writing, and tool use. OpenAI GPT is great for cheap, fast classification and structured extraction at volume. Google Gemini wins when the context is huge — whole PDFs, long meeting transcripts, entire codebases — or when the data already lives in Google Workspace. Meta Llama is the pick when you need to run the model on your own servers, in a regulated industry, or to keep data entirely off a third-party API. For voice I lean on ElevenLabs; for video and images, Sora and Nano Banana (Google's Gemini 2.5 Flash Image). Same codebase, swap with one config change — so when the next model drops, you upgrade without a rewrite.
Tell me the task you're tired of doing. I'll tell you if AI is the right fix.
First calls run about 20 minutes. You'll leave with a clearer plan — a scoped pilot, a recommendation, or an honest "this is a spreadsheet problem, not an AI problem." All three happen.