Reuben Smith

Hi, I'm Reuben 👋

Agentic systems engineer.

I build production AI platforms, and ship everything else the same way: teams of agents, a harness, humans at the gate.

NZ · GMT+13

About

I build agentic systems, and engineer everything else the same way. Same doctrine at both scales - declarative skills over deterministic tools, durable workflows, verification against real running systems before anything's declared done, humans at the review gate. At EasyAudit I shipped the AI Compliance Officer: an agent platform operating safely inside a regulated environment. Before that, six years at Soul Machines solving the hardest version of the problem - orchestrating parallel backend agents through a real-time voice interface. Full-stack, startup-minded, focused on context engineering - structuring information so systems (and teams) can reason effectively.

Capabilities

What I actually build

The patterns I reach for when a system has to reason, coordinate, and stay correct under real-world load.

Agent harness design

The loop that gives a model brains-and-hands: skill loading, tool resolution, context assembly, durable execution. The boring layer that makes the interesting part reliable.

A thin runtime that loads declarative skills and wires them to a typed tool registry
Context and state managed deliberately, not accumulated by accident
Swappable models and providers - the harness is the asset, not the vendor

Brains and hands

A clean split between what the model reasons about and what the system actually does. Declarative skills describe intent; deterministic tools take action.

Skills as domain-facing documents - readable and editable without touching code
Tools as typed, tested Python - if it can be deterministic, it is
A hard rule: if it needs a model, it's a skill; if it needs code, it's a tool

Durable, long-running workflows

Agent work is a workflow, not a request. Fan-out, aggregation, human gates, and staging gates that let agents operate safely on hour- and day-long timescales.

Durable execution so agents survive restarts and deploys without amnesia
Task chaining expressed declaratively - fan-out, many-to-one, conditional routing
Staging gates that route risky writes to human review before anything commits

Multi-agent coordination

When one agent isn't enough. Confidence-based routing, decoupled state for different consumers, and a shared task lifecycle so humans and agents work from the same playbook.

Goal-level planning that picks agents by capability and confidence, not a static table
Decoupled state layers - what end users watch and what the runtime replays are different stores
One task state machine covering agent work, human work, and the handoffs between

Evaluation as a system

“Is this output good enough?” deserves its own reasoning pipeline, not a single prompt. Structured, staged, reviewable - so the answer is trustworthy and the failures are debuggable.

Multi-stage assessment pipelines with clustering and consolidation steps
Confidence scoring wired to routing and human gates
Auditability: every agent action traceable and reviewable by a non-engineer

Full-stack product ownership

UI to infrastructure. I've built and shipped platforms where every layer - frontend, API, data, agent, integrations, CI, cloud - is mine.

Next.js / React frontends with real-time task streaming and rich authoring
Python and TypeScript services behind a multi-tenant Postgres with row-level security
AWS-native deployment (CDK, Fargate, Lambda), observability, CI/CD

Agentic engineering as practice

The same harness-and-hands thinking turned inward: it's how I engineer, not just what I engineer.

Research → plan → implement → verify, every task, enforced
Teams of agents running in parallel for exploration, decomposition, and implementation
Agent-driven QA (Playwright flows, screenshots, recordings, notes) producing sign-off artifacts for humans

Preferred stack

Languages: Python·TypeScript·JavaScript·Go·Solidity·C#
Frameworks: FastAPI·Next.js·React·Tailwind
Models: Claude·OpenAI·Gemini
LLM ops: LangChain·Langfuse·DeepEval
Data: PostgreSQL·MongoDB·Supabase
Cloud & infra: AWS (CDK · Lambda · Fargate · SQS · API Gateway · EC2 · S3)·GCP·Docker·Vercel·Render·Replicate
Workflow: Temporal·Celery·RabbitMQ·GitHub Actions
Creative & 3D: Unity·Blender·Maya·Photoshop·After Effects·OpenGL

Case studies

Receipts

Three roles, three hard agentic engineering problems - with the architecture decisions that made them work.

EasyAudit

2026

Founding Engineer

USA, Remote

The problem: SOC 2 and adjacent compliance frameworks take 3-6 months of expert-led manual work, and controls drift the moment you stop watching. Companies need a platform that can both generate and maintain a compliance program - with an agent that doesn't hallucinate into production.

What I built: the platform, end to end.

Product surface - two customer-facing Next.js apps: an audit dashboard (real-time task streaming, rich document authoring, evidence workflows, integration wizards, risk and vendor flows, role-based views for admins, collaborators, and auditors), and a public trust centre where customers share their compliance posture.
AI Compliance Officer - one brain across chat, dashboard, and API. Thin harness, declarative skills over a typed tool registry, durable workflows, confidence-based routing, staging gates for risky writes. The reasoning layer is the asset, not the model behind it.
EasyAudit Framework - a framework-agnostic control library. A 5-phase ML pipeline with dual-model concept extraction and NIST OLIR-based mapping normalises SOC 2, ISO 27001, NIST, HIPAA, and GDPR into one canonical library. Evidence collected once satisfies every framework mapped to the same control.
Continuous monitoring - a separate service that OIDC-federates into each customer's cloud environment and runs compliance benchmarks continuously, feeding evidence back onto their controls automatically. I ship custom Go plugins for the providers the ecosystem doesn't cover (Azure AD, Intune, M365 Purview). Turns compliance from a point-in-time audit into a living system.
Platform backbone - TypeScript API, Python services for the agent and monitoring, Lambda side-services for PDF reports and event-driven notifications. Multi-tenant Postgres with full row-level security and JWT-scoped roles.
Infrastructure & QA - full AWS CDK across Fargate and Lambda, including self-hosted services for durable workflows and LLM observability. GitHub Actions, Secrets Manager, end-to-end tracing. Playwright E2E, Jest, Storybook, and a structured weekly walkthrough across test orgs and personas.

Why it's hard: compliance is a precision-and-recall problem at scale. Every control maps to many risks, every risk demands evidence, every piece of evidence may satisfy multiple controls across multiple frameworks - and an audit fails if you miss a single required link or drown the auditor in irrelevant material. Now put an agent inside that many-to-many graph: it can't hallucinate, it can't under-cover, it can't over-cover, and every write has to stay in sync with UI state, task state, workflow state, and audit trail - while humans are writing in too. Correctness and auditability come before speed.

Outcome: customers consistently achieve SOC 2 Type I in 3-4 weeks with zero audit exceptions - versus the 3-6 month industry standard. Onboarding under 30 minutes. Total customer effort under 5 hours.

Background: EasyAudit acquired Veita (see below) in early 2026 - we'd been their primary engineering partner, and the acquisition brought the product fully in-house.

Veita

2024 - 2025

Lead Engineer

Toronto CA, Remote

The problem: hospital HR isn't one-policy-fits-all. A nurse's time-off policy depends on their department, job title, union, and employment type - cross-referenced against dozens of documents, many of them legacy scans and faxes. Existing AI chatbots return a nearest-neighbour guess and hope. In a regulated workplace, that's unacceptable.

What I built: the core platform - context-aware agents and a visual workflow editor, deployed as HR co-pilots across Canadian hospitals.

Policy retrieval grounded in the user - the agent reads who you are from the HRIS (department, title, union, employment type) and retrieves only the policies that apply to you, with citations and deep-links into the source document.
Legacy document ingestion - OCR and structural extraction over scanned and faxed PDFs so answers come from the actual source of truth, not a sanitised summary.
Platform over fork - every hospital deployment had different sources, policies, and escalation paths. Customisation absorbed through configuration, not new code.

Why it's hard: a policy question is a many-to-many intersection - N user attributes x M policy scoping conditions - and the answer has to be the one correct match with a citation a compliance officer can verify. Hallucinating here is a liability event, not an inconvenience.

Outcome: the agentic runtime became the seed of EasyAudit's AI Compliance Officer - used first for their control generation and scoping engine. EasyAudit acquired Veita in early 2026; I continued leading the engineering as Founding Engineer.

Soul Machines

2018 - 2024

Senior Software Engineer

Auckland NZ, Remote

Six years building infrastructure for autonomous digital humans - AI-driven avatars that see, hear, and respond in real-time.

Conversational Architecture

The problem: a digital human on a live video call is a voice agent orchestrating other agents in the background (Salesforce, ServiceNow, OpenAI). Voice is sequential - unlike a chatbot, you can't show a wall of status updates. Say a flight just got cancelled: rebook, find a hotel, notify a meeting - three tasks, three backend agents, running in parallel, delivered one thread at a time while tracking what's been said, what's pending, and what the user currently thinks they're talking to.

What I built: a system that coordinates parallel backend agents with sequential voice delivery.

Single source of truth - agents, UI, and conversation read/write to the same state.
Synchronised delivery - UI appears exactly when the avatar speaks about it.
Graceful interrupts - priorities change mid-conversation; in-flight work is superseded cleanly.
Multi-task orchestration - backend agents collaborate without overwhelming the conversation.

Why it's hard: parallel computation plus sequential delivery plus interruptibility plus empathy. The avatar has to know what's been said, what's pending, what the user knows, and which thread they're currently addressing.

Digital Human Creation Pipeline

Outcome: reduced digital human creation from 6 months by a ~20-person specialist team to 30 minutes in-browser by an end user.

Automated artist workflows - took manual specialist processes and made them algorithmic.
Baking microservice - finalises user creations in the Avatar Designer web app into deployable digital humans.
Unity migration - rebuilt the entire 3D asset pipeline on Unity, replacing a proprietary OpenGL renderer. Opened standard pipelines and self-service creation.

Personal Projects

A couple things I've built to learn something new.

ProductBoost

A Stable Diffusion pipeline for turning product photos into marketing-ready images - built end-to-end (auth, payments, AI inference, email) to learn the go-to-market side of shipping product. Processed real payments before I wound it down when ChatGPT shipped similar capabilities around the same time client work ramped up.

Next.js

Supabase

Stripe

Replicate

Stable Diffusion

Squshy

A small agency for clients pushing the edge of blockchain and art. Helped artist friends recover smart contracts their devs had abandoned, and built browser games that turned NFTs into playable characters and skins. Debugged gas issues, rebuilt dapps, shipped staking and mint flows on Ethereum and Polygon.

Solidity

Next.js

Construct 2

Ethereum

Polygon

IPFS

Education

Media Design School

2015 - 2017

Bachelor of Art & Design (3D Animation & Visual Effects)

University of Canterbury

2014 - 2014

Completed First Year of Bachelor of Science, Major in Computer Science

Contact

Say hi on LinkedIn.