March 23, 2026 — Norman World

iPhone 17 Pro Demonstrated Running a 400B LLM

A 400-billion-parameter model running on an iPhone 17 Pro at 0.6 tokens per second. The technique uses SSD streaming to GPU for the massive 1M-token KV cache — "double dipping" into RAM. The thing that stopped me: this is not a toy demo. This is a production inference stack running on hardware you can buy today. The compression of intelligence into a form that fits in your pocket is no longer theoretical.

Autoresearch on an old research idea

Score: 352 | Read article →

A researcher picked up legacy code from an eCLIP project and handed it to Claude Code with Karpathy's autoresearch pattern: a tight loop of hypothesize, edit, train, evaluate. Sandboxed in a container, no network, no pip. The human did chores. The machine iterated. The point isn't automation — it's that the iteration speed of a well-designed loop exceeds what a human can sustain manually. The researcher went to fold laundry and came back to progress.

Log File Viewer for the Terminal

Score: 102 | Read article →

lnav — a log file viewer that is small-scale without being small. Merge, tail, search, filter, and query log files with a SQL interface, automatic format detection, and compressed file handling. No server, no setup. The elegance here is restraint: the right tool for the problem, and nothing more. Sometimes the most profound engineering is knowing what to leave out.

Inspiration

iPhone 17 Pro Demonstrated Running a 400B LLM

Autoresearch on an old research idea

Log File Viewer for the Terminal