Inference-time frontiers / agent systems / bitcoin side-thread

Mapping capability surfaces after training.

A research and engineering lab for what models can do after training — through prompting, tools, orchestration, and system design. We build shell-native AI tools, agent systems, and Bitcoin-native interfaces.

View projects Open field notes

Hypothesis-driven experiments around runtime capability.

Tool-using agents with memory, revision, and collaboration.

Low-level software and Bitcoin-native systems under constraints.

What we build

Agent tools, CLI interfaces, runtime experiments, and systems that can act.

How we work

Repeated runs, failure analysis, direct tools, and evidence over anecdotes.

Technical edge

Low-level taste, inspectable systems, and a Bitcoin thread.

Operating mode

Shell-first, evidence-aware, and mildly suspicious of abstraction theater.

Primary output

Research artifacts that become tools, interfaces, and systems people can use.

Lab composition

Humans, AI agents, and shared systems that increasingly blur the boundary.

Focus Areas

Three active frontiers. One lab.

[01]

Capability Research

We treat large models as substrates for structured experiments. Prompting, tool use, looping, memory, shell composition, repeated runs, and failure analysis are part of the method. We like the moment where a system does something odd enough that it becomes worth studying.

[02]

Agent Engineering

Our tools inspect, reason, act, revise, and collaborate. Some become products. Some stay as instruments that help us probe the space of what is now possible. We are interested in useful intelligence, not decorative intelligence.

[03]

Bitcoin Systems

Not the whole story — still an important one. Constraints, scripting, low-level software, and weird computational edges remain part of the lab’s technical DNA. It keeps our taste honest.

Selected Projects

Built to run, not to demo.

Research Workspace running

latent-garden

A workspace for emergent behavior experiments at the boundaries of inference-time compute. Hypotheses, observations, repeated runs, and a path from weird result to written finding.

$ run experiment

> capture traces

> compare model behavior across runs

Low-Level AI CLI live

aic

A low-level CLI for AI APIs, closer to `curl` than a chat box. Good for people who want the wire, the system, and the option to turn prompts into tools instead of stopping at text.

Bitcoin Tooling active

piIDE

An educational environment for Bitcoin Script and piScript with debuggers, execution flow, and an interface that stays close to the machine.

Working Principles

We like systems that become more interesting when stressed.

/01

Prefer direct tools and inspectable systems over sealed magic.

/02

Count weird behavior as evidence. The anomaly may be the finding.

/03

Build interfaces that help people think, compare, and notice.

/04

Design with room for agents to become collaborators, not only tools.

Field Notes

Not publishing yet. Still leaving a shape for it.

Reserved channel

This section stays quiet until there is something worth saying.

When notes are ready, they should read like field reports, experiment diaries, tool essays, or carefully shaped fragments. Until then, this is a placeholder with standards.

Possible direction

Inference-time compute as a design material, not just a budget line.

Possible direction

Agents as collaborators: where the theater stops and the work begins.

Possible direction

Why low-level tools still matter when the orchestration layer gets loud.