Notebook

Field notes from the studio.

Writing for ML engineers, applied scientists, and the leaders deciding where to bet on AI next.

E
EVALS

Why your "90% accurate" LLM is failing in production

The gap between offline eval and live performance is almost always a coverage problem. A practical playbook.

Jun 2, 2026
A
AGENTS

Tool use is the new prompt engineering

Frontier model quality has converged. The remaining alpha is in the tool surface you expose to the agent.

May 26, 2026
R
RAG

When to retrieve, when to fine-tune, when to do both

A decision tree from 40+ deployments. The defaults most teams pick are wrong about a third of the time.

May 19, 2026
C
COST

We cut our LLM bill 87% without losing quality

Five techniques: cascading, distillation, structured outputs, semantic caching, and ruthless prompt compression.

May 11, 2026
S
SAFETY

Red-teaming production agents: a starter battery

42 attack templates that catch real-world misuse before your users do.

May 2, 2026
G
GOVERNANCE

EU AI Act for US enterprises: what actually applies

Most of it does not. Here is what does, and how to map your existing GRC stack to it.

Apr 22, 2026