Simon Gu - Thoughts on Product, Tech & Faith

Service-as-a-Software describes AI rollups tackling different industry verticals. By flipping the model, some entrepreneurs realized you can make more money by eating the entire margin of a service base company as opposed to selling AI software to the same companies promising cost-savings. The strategy works particularly well in real economy small businesses where technology adoption is slow. Vinay Iyengar explains more here.

Savvy Wealth in NYC is an example of an AI rollup for wealth advisory services.

Being AI Native

Vinay explains that roll up teams often require the founders being "AI-native". AI-native means that a product is designed from the ground up with AI as a core capability, and not as a bolt-on feature. An example: a CRM with an "AI summary" button is not AI-native, and a system where records are created, updated, and reasoned about by models is AI-native.

I have been advising a pre-seed startup building in a narrow slice of the insurance market. On paper, the market fits the description of where rollups can exist. The services providers are fragmented, growing (global warming tailwinds), relationship driven business on human expertise, and lacking technology expertise. However, the CEO, a very smart guy I have known for years, wants to be, at first, a pure software vertical play.

Lean Into the Probalistic

After the first few lines of code are written, every founder faces one of the biggest tradeoffs today. Do you lean into probabilistic systems? Being AI-native now often means building on transformer-based technology, which underpins small and large language models as well as the new generation of visual models. These models do not understand how our world works, but can describe how it may work based on observations of what occurs frequently.

This choice is a scary one, because unbounded AI can quickly introduce chaos. In industries with low technology adoption, trust is fragile and easily eroded when models produce incorrect results. Sales cycles are long and trust is not restored by explaining technical tradeoffs. As a result, the startup’s CEO pushed to write as much deterministic code as possible (clear if, then, else logic).

Implementing Evals on Day 1

If a team chooses to lean into probabilistic frameworks from day one, then designing and maintaining evals is just as important as defining MVP features. Intense competition among foundational models has driven rapid improvements in reasoning, task completion, and large-context handling, while also pushing these capabilities toward commoditization. This gives startups a major advantage: the ability to choose model versions and providers based on real-time tradeoffs between cost, latency, and accuracy.

LangChain is an SDK for building and evaluating AI-powered applications and their inputs and outputs. LangSmith complements it with observability and evaluation tooling, providing a dashboard to inspect traces, run evals, and track metrics as models and configurations evolve.

An example of something I've built:

PDF
 └─► LlamaParse
      └─► Parsed artifacts
           └─► Extraction logic
                └─► Structured output
                     └─► LangChain evaluators
                          └─► LangSmith runs
                               ├─ datasets
                               ├─ scores
                               ├─ diffs
                               └─ regressions

This setup makes it easy to swap LlamaParse for Reducto or any new tool as the primary provider, or route complex tasks between them using different prompts, supported by accurate output comparisons. This example only scratches the surface, as agents and tool calling are foundational to modern systems of work.

I liken evals to guardrails on a mountain road: they make stochastic systems feel safer to operate. Still, defining, maintaining, and evolving a strong suite of evals is non-trivial and demands meaningful engineering investment.

Operating Within Constraints

It is impossible to know if becoming fully AI-native from day 0 is the right tradeoff decision based on the industry the startup operates in. All startups are resourced constrained and the decision to do so seemingly goes against Paul Graham's "do things that don't scale." However, with the rise of AI-native newcos reaching $100M ARR in record time, we might also see a paradigm shift in advice for startup founders in the present times.

I believe the CEO made the best decision available at the time, and only time will reveal its outcome. I also hold the healthy tension that success is not binary, and that each path can lead to different forms of success, whether through a traditional SaaS model optimized for ARR and net retention or a model oriented around EBITDA margins and private equity sensibilities.