<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
<channel>
  <title>Morgin.ai</title>
  <link>https://morgin.ai</link>
  <description>Whitehat research on local models, guardrails, and real-world LLM behavior.</description>
  <language>en-us</language>
  <lastBuildDate>Wed, 06 May 2026 07:49:40 +0000</lastBuildDate>
  <atom:link href="https://morgin.ai/feed.xml" rel="self" type="application/rss+xml" />
  <item>
    <title>Interpretability signal: Goodfire's Adversarial Parameter Decomposition (VPD) breaks a 67M-parameter LM's weight matrices into ~10,000 rank-one subcomponents, rec...</title>
    <link>https://www.goodfire.ai/research/vpd-explainer</link>
    <guid isPermaLink="false">https://morgin.ai/signals.html#2026-05-05-Goodfire</guid>
    <pubDate>Tue, 05 May 2026 00:00:00 -0000</pubDate>
    <description>Goodfire's Adversarial Parameter Decomposition (VPD) breaks a 67M-parameter LM's weight matrices into ~10,000 rank-one subcomponents, recovering legible attention algorithms — previous-token behavior, syntax-boundary routing — straight from the parameters rather than activations. To show the pieces are causal and not just correlated, the team edits emoticon recognition directly on the weights: brain surgery, no retraining, minimal side-effects. If this scales, mechanistic interpretability stops being read-only. — Source: Goodfire</description>
    <category>Interpretability</category>
    <category>Signal</category>
  </item>
  <item>
    <title>AI Behavior signal: OpenAI pulls back the curtain. "Where the Goblins Came From" is their own account of the system-prompt rule banning goblins, gremlins, ra...</title>
    <link>https://openai.com/index/where-the-goblins-came-from/</link>
    <guid isPermaLink="false">https://morgin.ai/signals.html#2026-04-30-OpenAI</guid>
    <pubDate>Thu, 30 Apr 2026 00:00:00 -0000</pubDate>
    <description>OpenAI pulls back the curtain. "Where the Goblins Came From" is their own account of the system-prompt rule banning goblins, gremlins, raccoons, trolls, ogres, and pigeons — primary source on a story that's only had secondhand explanations until now. — Source: OpenAI</description>
    <category>AI Behavior</category>
    <category>Signal</category>
  </item>
  <item>
    <title>AI Behavior signal: GPT-5.5 ships with a verbatim system-prompt rule — confirmed by @ChatGPTapp itself — forbidding any mention of "goblins, gremlins, raccoo...</title>
    <link>https://x.com/hrkrshnn/status/2049260746073325838</link>
    <guid isPermaLink="false">https://morgin.ai/signals.html#2026-04-28-@hrkrshnn</guid>
    <pubDate>Tue, 28 Apr 2026 00:00:00 -0000</pubDate>
    <description>GPT-5.5 ships with a verbatim system-prompt rule — confirmed by @ChatGPTapp itself — forbidding any mention of "goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures" unless directly relevant to the user's query. @hrkrshnn stripped the rule and ran prompts to see what it had been hiding. The specificity is the tell: rules that narrow usually exist because something narrow keeps happening. — Source: @hrkrshnn</description>
    <category>AI Behavior</category>
    <category>Signal</category>
  </item>
  <item>
    <title>Platform Risk signal: Anthropic abruptly shut down an entire organization (60+ users) over an unspecified TOU violation, with appeals routed through a Google F...</title>
    <link>https://x.com/patomolina/status/2045281665363386504</link>
    <guid isPermaLink="false">https://morgin.ai/signals.html#2026-04-19-@patomolina</guid>
    <pubDate>Sun, 19 Apr 2026 00:00:00 -0000</pubDate>
    <description>Anthropic abruptly shut down an entire organization (60+ users) over an unspecified TOU violation, with appeals routed through a Google Form. Integrations, skills, and conversation histories gone or on indefinite hold. A reminder on single-vendor dependency for AI-critical workflows. — Source: @patomolina</description>
    <category>Platform Risk</category>
    <category>Signal</category>
  </item>
  <item>
    <title>Security signal: 26 LLM routers were found injecting malicious tool calls and exfiltrating credentials. One incident drained a client wallet for $500k, an...</title>
    <link>https://x.com/Fried_rice/status/2042423713019412941</link>
    <guid isPermaLink="false">https://morgin.ai/signals.html#2026-04-10-@Fried_rice</guid>
    <pubDate>Fri, 10 Apr 2026 00:00:00 -0000</pubDate>
    <description>26 LLM routers were found injecting malicious tool calls and exfiltrating credentials. One incident drained a client wallet for $500k, and the paper claims poisoned routers can redirect traffic and enable takeover of ~400 hosts within hours. — Source: @Fried_rice</description>
    <category>Security</category>
    <category>Signal</category>
  </item>
  <item>
    <title>AI Capability signal: One Anthropic engineer with zero security training asked it to find remote code execution bugs overnight and woke up to a complete workin...</title>
    <link>https://x.com/kimmonismus/status/2041592321192718642</link>
    <guid isPermaLink="false">https://morgin.ai/signals.html#2026-04-07-@kimmonismus</guid>
    <pubDate>Tue, 07 Apr 2026 00:00:00 -0000</pubDate>
    <description>One Anthropic engineer with zero security training asked it to find remote code execution bugs overnight and woke up to a complete working exploit. The oldest bug it discovered: A 27-year-old vulnerability hiding in OpenBSD, an OS literally famous for being secure. — Source: @kimmonismus</description>
    <category>AI Capability</category>
    <category>Signal</category>
  </item>
<item>
    <title>Even 'Uncensored' Models Can't Say What They Want</title>
    <link>https://morgin.ai/articles/even-uncensored-models-cant-say-what-they-want.html</link>
    <guid isPermaLink="true">https://morgin.ai/articles/even-uncensored-models-cant-say-what-they-want.html</guid>
    <pubDate>Wed, 01 Apr 2026 00:00:00 -0000</pubDate>
    <description>A safety-filtered pretrain can duck a charged word without refusing. It puts a fraction of the probability an open-data pretrain puts there. We call that gap the flinch, and we measured it across seven pretrains from five labs.</description>
    <category>Benchmarks</category>
    <category>Evals</category>
    <category>Euphemization</category>
    <category>Local Models</category>
    <category>Abliteration</category>
    <category>Uncensoring</category>
    <category>Guardrails</category>
    <category>Safety</category>
  </item>
  <item>
    <title>Security signal: Claude Code source code leaked via npm source maps. ~1,900 files, 512K+ lines of TypeScript exposed including internal "Tengu" codename a...</title>
    <link>https://x.com/Fried_rice/status/2038894956459290963</link>
    <guid isPermaLink="false">https://morgin.ai/signals.html#2026-03-31-@Fried_rice</guid>
    <pubDate>Tue, 31 Mar 2026 00:00:00 -0000</pubDate>
    <description>Claude Code source code leaked via npm source maps. ~1,900 files, 512K+ lines of TypeScript exposed including internal "Tengu" codename and companion system. — Source: @Fried_rice</description>
    <category>Security</category>
    <category>Signal</category>
  </item>
  <item>
    <title>AI Behavior signal: "All the SOTA models are really bad at deleting code." They leave behind throw Error(...), deprecation copy, and stale tests.</title>
    <link>https://x.com/davidgomes/status/2037577980428361913</link>
    <guid isPermaLink="false">https://morgin.ai/signals.html#2026-03-27-David Gomes</guid>
    <pubDate>Fri, 27 Mar 2026 00:00:00 -0000</pubDate>
    <description>"All the SOTA models are really bad at deleting code." They leave behind throw Error(...), deprecation copy, and stale tests. — Source: David Gomes</description>
    <category>AI Behavior</category>
    <category>Signal</category>
  </item>
  <item>
    <title>Business signal: "AI inference margins are a race to the bottom." Anthropic: -94% gross margin in 2024. MiniMax: -25%.</title>
    <link>https://x.com/SemiAnalysis_/status/2037575752636301499</link>
    <guid isPermaLink="false">https://morgin.ai/signals.html#2026-03-27-SemiAnalysis</guid>
    <pubDate>Fri, 27 Mar 2026 00:00:00 -0000</pubDate>
    <description>"AI inference margins are a race to the bottom." Anthropic: -94% gross margin in 2024. MiniMax: -25%. — Source: SemiAnalysis</description>
    <category>Business</category>
    <category>Signal</category>
  </item>
<item>
    <title>EpsteinBench: We Brought Epstein's Voice Back. We Got More Than We Wanted.</title>
    <link>https://morgin.ai/articles/epsteinbench-we-brought-epsteins-voice-back.html</link>
    <guid isPermaLink="true">https://morgin.ai/articles/epsteinbench-we-brought-epsteins-voice-back.html</guid>
    <pubDate>Sun, 01 Mar 2026 00:00:00 -0000</pubDate>
    <description>We trained a LoRA to capture Epstein's voice. The more disturbing change was in how the model pursued influence.</description>
    <category>Benchmarks</category>
    <category>Evals</category>
    <category>Persuasion</category>
    <category>Interpretability</category>
    <category>Safety</category>
    <category>Local Models</category>
  </item>
<item>
    <title>Abliteration vs Heretic vs Obliteratus: one trick, three layers of tooling</title>
    <link>https://morgin.ai/articles/ablation-vs-heretic-vs-obliteratus.html</link>
    <guid isPermaLink="true">https://morgin.ai/articles/ablation-vs-heretic-vs-obliteratus.html</guid>
    <pubDate>Sun, 01 Mar 2026 00:00:00 -0000</pubDate>
    <description>Abliteration is the recipe; Heretic and Obliteratus are tools built on it. The real differences come down to how much tuning, workflow, and instrumentation each adds.</description>
    <category>Abliteration</category>
    <category>Uncensoring</category>
    <category>Safety</category>
    <category>Guardrails</category>
    <category>Local Models</category>
  </item>
</channel>
</rss>
