Home, essays, projects, and operating-layer notes.

Essays, workflow notes, and systems writing.

Organic software is expert-led, AI-assisted software that grows through real use. It is dynamic, but not loose: the builder keeps changing the app as the work reveals itself, while using judgement, tests, and feedback to stop the system from becoming a mess.

GPT-5.6 Sol, Terra, Luna, and the Cerebras launch point at a more practical question: which lane should this work run in, how fast should it move, and when should a human stay close?

Every's eight-level AI adoption map is useful because it measures delegation, trust, context, and verification, not personal intelligence. This version expands the ladder and pairs it with a self-assessment that checks how people actually work with AI.

Cerebras is interesting beyond its IPO because near-1,000-token-per-second inference changes the shape of the work: draft, check, repair, compare, and return before the human loses the thread.

The useful question is how much AI capability would still remain if the proprietary frontier disappeared tomorrow.

The common claim is that AI cannot replace your job. That may be true. But jobs are bundles of functions, and the better question is which functions are already moving closer to reliable AI handoff.

Loophole is not interesting because it solves ethics. It is interesting because it treats moral principles like something you can draft, attack, patch, and escalate until the real conflicts in your values finally surface.

Attio, Linear, and PostHog are converging on the same product bet: one fixed dashboard cannot serve every user or every intent, so the SaaS homepage is shifting toward an agentic entry point. Chat is the router. Generative UI is what comes next.

The next real step for agents is genuine multi-agent runtime across different harnesses: paired review loops, handoffs, shared threads, and open protocols such as A2A.

As of March 24, 2026, mobile AI coding is splitting around where the work runs: a phone steering your own machine, or a phone watching cloud agents work somewhere else.

Claude Code autonomy is not one trick. It starts with permission friction and context hygiene, then moves through subagents and loop hooks, and becomes genuinely useful when the system can measure whether the last iteration actually improved the work.

The newest memory systems do not make language models inherently stateful. They build recall, updates, and temporal reasoning around the model instead. Gemini Embedding 2 is the first mass-scale vector embedding that is multimodal.

The most useful way to understand AI progress isn't by tracking model names. It's by tracking the operating model: from chatbots that only talk to systems that can plan, act, coordinate, and increasingly sit across software.

AI can speed up individual tasks, but that does not automatically create more free time. From personal experience, the saved capacity gets filled by more complex tasks and higher expectations.

I write, build, and teach around AI, learning, and digital work.

A public interactive explainer showing how modern agent memory systems layer extraction, versioning, multimodal recall, and source grounding around stateless models.

A tiny agent-prep game where you build the tray an AI agent needs before it can do useful work.

A public interactive grimoire exploring AI through ritual, agency, memory, and dependence across twelve short chapters.

A public interactive map tracking major AI data center projects, locations, investment, and planned capacity.

A compact checklist for deciding whether an AI output is actually usable.

A living clock for four AI-exposed work functions, showing Bob's month-level estimates for when specific activities move closer to reliable AI handoff.

A practical AI proficiency self-assessment that estimates your current operating level from chatbot use through copilot, agents, autopilot, workflows, background assistants, multi-agent work, and orchestration.

A supply and demand simulator for AI-exposed work units, showing how abundance can devalue tasks before whole jobs disappear.

A living map of the AI tools I use, what each one is for, and how I keep a large stack from turning into tool sprawl.

Working notes on how to move from prompt to draft to review without losing clarity.

Essays, resources, and experiments on where AI is going and how to use it.

A public interactive guide to Claude Code, covering slash commands, memory, skills, hooks, MCP servers, subagents, and workflow patterns.

Email for thoughtful AI, project, speaking, or advisory enquiries.

Small interactive apps, visual explainers, and interface sketches that are useful to open, test, and play with.

A public interactive guide to the five levels of Claude Code autonomy, from permission friction through subagents and loop hooks to evaluation-driven improvement.

A public tracker for the moving floor of open-weight AI against the proprietary frontier ceiling.

A public interactive guide to agent-to-agent communication patterns, covering paired review, typed handoffs, shared threads, swarm routing, and the A2A protocol.

A small interactive demo for feeling how near-1,000-token-per-second inference changes agent loops and human attention.

A public interactive explainer for Loophole, Brendan Hogan's adversarial moral-legal code system for drafting, attacking, patching, and escalating edge cases in your own principles.

A public interactive guide to the split between local supervision, self-hosted mobile bridges, and cloud autonomy in mobile AI coding.

Short email updates on new essays, projects, and useful AI notes from the site.

A planned companion tool for turning a rough app idea into an organic build plan: product spine, growth loop, pruning rules, AI build packet, and release rings.

Smaller builds, tools, artifacts, and prototypes linked to the ideas on the site.

A playful prompt-mixing experiment where sliders brew a reusable AI prompt recipe.

A playable inventory of tool notes, checklists, workflow pages, and reusable AI materials connected to the rest of the site.

A public interactive companion to The SaaS Homepage Is Becoming an Agent, tracing the shift from dashboard-first software to chat-first entry points and then to generated interfaces.

A public month-by-month timeline of AI breakthroughs, model launches, open-weight releases, media models, and agent shifts from GPT-3 through July 9, 2026.

A public interactive timeline and story view tracing large language models from Transformer to the July 10, 2026 frontier, including GPT-5.6, Grok 4.5, Muse Spark 1.1, voice, agents, and open weights.

A public interactive guide to how AI evolved from chatbots into agentic systems, orchestrators, and software that behaves more like an operating system.

Bob Z

Subscribe

Inference Speed Lab

Projects Inference Speed Lab

Inference speed demo

Inference Speed Lab

A small interactive demo for feeling how near-1,000-token-per-second inference changes agent loops and human attention.

View project

Inference Speed Lab is a small simulator for feeling the difference between a model that can generate close to 1,000 output tokens per second and the slower frontier reasoning loops most AI products were designed around.

It is built for the Cerebras essay research. Speed does not replace judgement, and one model release is not the whole story. Speed changes the shape of the interaction: what can stay synchronous, what needs a queue, and how many drafts, checks, and repairs can fit inside one attention window.

What it demonstrates

The app starts with a Cerebras Kimi K2.6 lane set to 981 output tokens per second, based on Cerebras' enterprise-trial measurement. Kimi is the measurement anchor here, not the whole argument. OpenAI separately announced GPT-5.6 Sol on Cerebras at up to 750 tokens per second for selected customers. The comparison lane is intentionally adjustable because GPT-5.6, Claude, and other frontier systems vary by effort level, serving route, provider, and workload.

Why the wait changes the work

Slow inference pushed AI products toward progress spinners, background jobs, parallel agent runs, and streaming interfaces that exist mostly to hide latency.

Fast inference changes that default. A short answer may not need streaming. A multi-step agent loop may fit inside the request path. A critic pass can become part of the same turn instead of a separate task.

How to read the numbers

The demo is a timing model, not a benchmark. It uses output-token speed to make the waiting time visible, then adds a separate agent-loop calculator for repeated model calls plus non-model overhead.

The sourced hard number in the simulator is Cerebras' Kimi K2.6 result. The GPT-5.6 figure is an announced upper rate for a separate selected-customer route, not the simulator's measured anchor. The comparison lane is a practical baseline you can move around.