MORGIN.AI

Signal Watch

Timely signals from the AI research and security landscape. Curated observations on emergent behavior, security incidents, and market shifts.

May 5, 2026

Goodfire's Adversarial Parameter Decomposition (VPD) breaks a 67M-parameter LM's weight matrices into ~10,000 rank-one subcomponents, recovering legible attention algorithms — previous-token behavior, syntax-boundary routing — straight from the parameters rather than activations. To show the pieces are causal and not just correlated, the team edits emoticon recognition directly on the weights: brain surgery, no retraining, minimal side-effects. If this scales, mechanistic interpretability stops being read-only.

Goodfire Interpretability
April 30, 2026

OpenAI pulls back the curtain. "Where the Goblins Came From" is their own account of the system-prompt rule banning goblins, gremlins, raccoons, trolls, ogres, and pigeons — primary source on a story that's only had secondhand explanations until now.

OpenAI AI Behavior
April 28, 2026

GPT-5.5 ships with a verbatim system-prompt rule — confirmed by @ChatGPTapp itself — forbidding any mention of "goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures" unless directly relevant to the user's query. @hrkrshnn stripped the rule and ran prompts to see what it had been hiding. The specificity is the tell: rules that narrow usually exist because something narrow keeps happening.

@hrkrshnn AI Behavior
April 19, 2026

Anthropic abruptly shut down an entire organization (60+ users) over an unspecified TOU violation, with appeals routed through a Google Form. Integrations, skills, and conversation histories gone or on indefinite hold. A reminder on single-vendor dependency for AI-critical workflows.

@patomolina Platform Risk
April 10, 2026

26 LLM routers were found injecting malicious tool calls and exfiltrating credentials. One incident drained a client wallet for $500k, and the paper claims poisoned routers can redirect traffic and enable takeover of ~400 hosts within hours.

@Fried_rice Security
April 7, 2026

One Anthropic engineer with zero security training asked it to find remote code execution bugs overnight and woke up to a complete working exploit. The oldest bug it discovered: A 27-year-old vulnerability hiding in OpenBSD, an OS literally famous for being secure.

@kimmonismus AI Capability
March 31, 2026

Claude Code source code leaked via npm source maps. ~1,900 files, 512K+ lines of TypeScript exposed including internal "Tengu" codename and companion system.

@Fried_rice Security
March 27, 2026

"All the SOTA models are really bad at deleting code." They leave behind throw Error(...), deprecation copy, and stale tests.

David Gomes AI Behavior
March 27, 2026

"AI inference margins are a race to the bottom." Anthropic: -94% gross margin in 2024. MiniMax: -25%.

SemiAnalysis Business