Digest

What I read today

A daily dig through my RSS feeds — the links and ideas worth keeping, each with a short note on what it's about.

July 1
2026
Wednesday
10 reads
Agents
A Java Migration Benchmark Shows Compilation Is Not Success
— Hugging Face / IBM Research

ScarfBench brings enterprise Java migration evaluation back to system reality: success means build, deploy, and behavioral validation all pass, while today's strongest agents still stay below a 10% behavioral success rate.

Architecture
Agent Leverage Comes From Loops, Not Prettier Prompts
— Latent Space

Loopcraft moves people from writing one-off prompts into designing loops; goals, feedback, routing, validation, budgets, and permission boundaries are the real leverage in agent systems.

Types
Parsing Turns Validation Into a Type-Carried Proof
— cekrem.github.io

Parse-don't-validate is not about adding checks everywhere; it is about narrowing untrusted input at request, URL, database, and env boundaries so later code can rely on domain types instead of memory.

June 29
2026
Monday
6 reads
Architecture
GLM 5.2 Shows Agent Results Depend on the Harness
— Semgrep / Hacker News

GLM 5.2 scores 39% F1 on IDOR detection, ahead of Claude Code's 32%, but Semgrep's own multimodal harness reaches 53-61%; the useful comparison is the full system of model, context selection, output parsing, and execution loop.

Security
`.agentignore` Is Not a Security Boundary
— GitHub issue / Hacker News

Ignore files reduce noise and express intent, but if the agent process can still read a secret, tool output, search results, and logs can leak it; the real boundary has to come from the OS, containers, VMs, or least-privilege credentials.

Workflow
Agents Belong in the Human Loop
— Jon Udell via Simon Willison

Jon Udell argues against reducing people to approval buttons; the better design keeps human plans, queues, review, and history as the main loop, with agents joining through visible, recoverable small steps.

Systems
WAL-RUS Is About Predictable Memory, Not Speed
— ClickHouse / Hacker News

ClickHouse's Rust rewrite of its WAL archiver matters less as a generic speed story than as a resource-predictability story: under WAL-heavy load, virtual memory falls from nearly 2.8GB to under 1GB.

Open Models
Open Models Are Splitting by Business Motive
— Interconnects

Open-weight releases are no longer a single movement led by a few players; pure model makers, Big Tech, product companies, and sovereign AI efforts all open models for different economic reasons.

Strategy
AI Competition Turns Capability Lead Into a Price War
— Gary Marcus

Gary Marcus reads China's model catch-up as a no-moat story: more competitors, lower token prices, thinner margins, and a costly paradigm whose capability lead may not become a durable business moat.

June 24
2026
Wednesday
14 reads
Agents
The Coming Loop
— Armin Ronacher

On the two loops inside agentic coding — the inner agent loop that ends when the model says "done," and the outer harness loop that decides whether to keep going — and why the second is remarkable on disposable, verifiable work but corrosive on code meant to last.

Engineering
Slow Down to Speed Up
— The Pragmatic Engineer

How the November 2025 agents multiplied code output while human review stayed flat — and how Meta's largest-ever incident traced back to AI-written, AI-reviewed code shipping past a gutted Trust & Safety team.

Agents
Coinbase cut idea-to-production by 90%
— Cursor

How Coinbase compressed its delivery cycle from 20 days to 1.8 using Plan Mode and five-to-seven parallel agents, with 75% of pull requests now opened by an agent.

Security
Prompt Injection as Role Confusion
— Simon Willison

A research finding that models infer who is speaking from a text's style rather than its role tags — and that rewriting an attack to read slightly off-format drops its success rate from 61% to 10%.

Security
After Mythos: AI Red-Teaming with Gray Swan
— Latent Space

On an automated red-teamer that now out-ranks human professionals, the finding that larger models are not automatically safer, and the "Lethal Trifecta" — untrusted input, private data, and an exfiltration path together.

Open Models
GLM-5.2 Is the Step Change for Open Models
— Interconnects

Why GLM-5.2 is the first open-weight model that works as a general agent inside a Claude Code-style harness, narrowing the US–China gap to about 6.8 months at a fraction of the price.

Agents
CUGA: An Open Agent Harness
— Hugging Face / IBM

IBM's open harness tops AppWorld and WebArena on an open-weight model by moving planning, state, and reflection into the harness, leaving developers to write only tools and prompts.

Tools
Oak: A Version Control System Rebuilt for Agents
— oak.space

Flat Mercurial-style manifests and lazy mounting give an agent seconds-to-first-edit on a multi-GB monorepo without cloning the whole thing — at the cost of leaving the Git ecosystem behind.

Architecture
In Praise of memcached
— jchri.st

An argument that memcached suits caching precisely because it does less — no persistence, no clustering — forcing correct "cache can vanish" semantics and sidestepping the Redis-as-database trap.

Game Dev
The Low-Tech AI of Elden Ring
— nega.tv

How FromSoftware builds boss behavior without planning algorithms — a pushdown-automaton goal stack, weighted-random action selection, and interrupt callbacks that keep designers in full control.

Infrastructure
Why American Data Centers Can't Plug In
— Works in Progress

Why the bottleneck for AI data centers is not power but a first-come interconnection queue that fills with speculative projects — and how auctioning slots and pricing flexibility could clear it.

Cities
Why the West Stopped Making Land
— Works in Progress

How land reclamation stalled across the West around 1970 — not by prohibition, but by litigable environmental review that pushed single-project approval times into decades.

Archive

3 articles