Digest

What I read today

A daily dig through my RSS feeds — the links and ideas worth keeping, each with a short note on what it's about.

July 1

2026

Wednesday

10 reads

Product

AI Products That Are Hard to Eval Are Usually Hard to Trust

— Hamel Husain

If users have to redo the AI's work to verify it, the issue is not just evaluation; the product has not made sources, definitions, intermediate steps, and unverifiable claims first-class artifacts.

Agents

A Java Migration Benchmark Shows Compilation Is Not Success

— Hugging Face / IBM Research

ScarfBench brings enterprise Java migration evaluation back to system reality: success means build, deploy, and behavioral validation all pass, while today's strongest agents still stay below a 10% behavioral success rate.

Architecture

Enterprise Agents Are About Encoding Process, Not Just Connecting a Model

— Latent Space

Sierra's agent engineer and FDE model shows that the hard work is encoding customer workflows, APIs, brand voice, release governance, and verification paths into the system.

Architecture

Agent Leverage Comes From Loops, Not Prettier Prompts

— Latent Space

Loopcraft moves people from writing one-off prompts into designing loops; goals, feedback, routing, validation, budgets, and permission boundaries are the real leverage in agent systems.

Trust

Hidden Markers Hurt Developer-Tool Trust More Than Detection Itself

— thereallo.dev

The Claude Code prompt-steganography debate is less about whether anti-abuse detection is reasonable and more about how local developer tools need visible, auditable policies when they sit near files, commands, and credentials.

Types

Parsing Turns Validation Into a Type-Carried Proof

— cekrem.github.io

Parse-don't-validate is not about adding checks everywhere; it is about narrowing untrusted input at request, URL, database, and env boundaries so later code can rely on domain types instead of memory.

Systems

A CUDA Kernel Launch Is a CPU-Driver-GPU Protocol

— Fergus Finn

A vector-add kernel hides a full protocol stack: nvcc, PTX/SASS, the host launch stub, driver ioctls, pushbuffers, GPFIFO, QMD, doorbells, SMs, and warp scheduling.

Systems

Rust Did Not Drop the Error; the Code Dropped Async State

— Cloudflare

Cloudflare's truncated-response bug came from hyper's HTTP/1 state machine discarding Poll::Pending: bytes stayed buffered, the connection shut down, and clients saw an early EOF.

Infrastructure

Data-Center Costs Show Up First in Local Power Bills and Grid Investment

— 404 Media

Henrico County shows how AI and cloud infrastructure costs do not live only inside cloud invoices; they spill into local grid investment, rate allocation, and public-institution power bills.

Risk

Radiation Risk Turns on Dose Rate and Statistical Noise, Not Total-Dose Slogans

— Works in Progress

Low-dose radiation risk cannot be collapsed into one cumulative-dose fear index; the useful frame separates dose rate, exposure path, control, consent, and statistical uncertainty.

June 29

2026

Monday

6 reads

Architecture

GLM 5.2 Shows Agent Results Depend on the Harness

— Semgrep / Hacker News

GLM 5.2 scores 39% F1 on IDOR detection, ahead of Claude Code's 32%, but Semgrep's own multimodal harness reaches 53-61%; the useful comparison is the full system of model, context selection, output parsing, and execution loop.

Security

`.agentignore` Is Not a Security Boundary

— GitHub issue / Hacker News

Ignore files reduce noise and express intent, but if the agent process can still read a secret, tool output, search results, and logs can leak it; the real boundary has to come from the OS, containers, VMs, or least-privilege credentials.

Workflow

Agents Belong in the Human Loop

— Jon Udell via Simon Willison

Jon Udell argues against reducing people to approval buttons; the better design keeps human plans, queues, review, and history as the main loop, with agents joining through visible, recoverable small steps.

Systems

WAL-RUS Is About Predictable Memory, Not Speed

— ClickHouse / Hacker News

ClickHouse's Rust rewrite of its WAL archiver matters less as a generic speed story than as a resource-predictability story: under WAL-heavy load, virtual memory falls from nearly 2.8GB to under 1GB.

Open Models

Open Models Are Splitting by Business Motive

— Interconnects

Open-weight releases are no longer a single movement led by a few players; pure model makers, Big Tech, product companies, and sovereign AI efforts all open models for different economic reasons.

Strategy

AI Competition Turns Capability Lead Into a Price War

— Gary Marcus

Gary Marcus reads China's model catch-up as a no-moat story: more competitors, lower token prices, thinner margins, and a costly paradigm whose capability lead may not become a durable business moat.

June 24

2026

Wednesday

14 reads

Agents

The Coming Loop

— Armin Ronacher

On the two loops inside agentic coding — the inner agent loop that ends when the model says "done," and the outer harness loop that decides whether to keep going — and why the second is remarkable on disposable, verifiable work but corrosive on code meant to last.

Engineering

Slow Down to Speed Up

— The Pragmatic Engineer

How the November 2025 agents multiplied code output while human review stayed flat — and how Meta's largest-ever incident traced back to AI-written, AI-reviewed code shipping past a gutted Trust & Safety team.

Agents

Coinbase cut idea-to-production by 90%

— Cursor

How Coinbase compressed its delivery cycle from 20 days to 1.8 using Plan Mode and five-to-seven parallel agents, with 75% of pull requests now opened by an agent.

Security

Prompt Injection as Role Confusion

— Simon Willison

A research finding that models infer who is speaking from a text's style rather than its role tags — and that rewriting an attack to read slightly off-format drops its success rate from 61% to 10%.

Security

After Mythos: AI Red-Teaming with Gray Swan

— Latent Space

On an automated red-teamer that now out-ranks human professionals, the finding that larger models are not automatically safer, and the "Lethal Trifecta" — untrusted input, private data, and an exfiltration path together.

Open Models

GLM-5.2 Is the Step Change for Open Models

— Interconnects

Why GLM-5.2 is the first open-weight model that works as a general agent inside a Claude Code-style harness, narrowing the US–China gap to about 6.8 months at a fraction of the price.

Research

VibeThinker: 3B Matches the Giants on Verifiable Reasoning

— arXiv

A 3-billion-parameter model that ties 600B–1T flagships on math and competitive programming where answers are machine-checkable, via a two-stage "Spectrum-to-Signal" post-training recipe.

Agents

CUGA: An Open Agent Harness

— Hugging Face / IBM

IBM's open harness tops AppWorld and WebArena on an open-weight model by moving planning, state, and reflection into the harness, leaving developers to write only tools and prompts.

Tools

Oak: A Version Control System Rebuilt for Agents

— oak.space

Flat Mercurial-style manifests and lazy mounting give an agent seconds-to-first-edit on a multi-GB monorepo without cloning the whole thing — at the cost of leaving the Git ecosystem behind.

Architecture

In Praise of memcached

— jchri.st

An argument that memcached suits caching precisely because it does less — no persistence, no clustering — forcing correct "cache can vanish" semantics and sidestepping the Redis-as-database trap.

Science

GPT-5 Pro Helps Crack a 3-Year Immunology Mystery

— OpenAI

How GPT-5 Pro gave an immunologist a new angle on T-cell behavior that explained an experiment he had been unable to account for over three years.

Game Dev

The Low-Tech AI of Elden Ring

— nega.tv

How FromSoftware builds boss behavior without planning algorithms — a pushdown-automaton goal stack, weighted-random action selection, and interrupt callbacks that keep designers in full control.

Infrastructure

Why American Data Centers Can't Plug In

— Works in Progress

Why the bottleneck for AI data centers is not power but a first-come interconnection queue that fills with speculative projects — and how auctioning slots and pricing flexibility could clear it.

Cities

Why the West Stopped Making Land

— Works in Progress

How land reclamation stalled across the West around 1970 — not by prohibition, but by litigable environmental review that pushed single-project approval times into decades.

What I read today

Archive