Back to Research Archive Morgin.ai research brief

Uncensoring Methods · March 2026

Ablation vs Heretic vs Obliteratus: one trick, three layers of tooling

Ablation versus Heretic article banner image provided by user

Cover image: User provided · pasted-1772480746.png

Ablation, Heretic, and Obliteratus are closely related. The real differences come down to how much tuning, tooling, and workflow each one adds.

Ablation is the core move. Heretic and Obliteratus extend it in different directions.

Refusal behavior lives in identifiable directions inside the model. Edit those directions and behavior moves fast.

From there it turns into tooling. Ablation is the move. Heretic adds search. Obliteratus adds a larger workbench.

The Short Version

Where Each Method Acts

Some safety behavior lives in the model and some lives in the serving stack around it.

Where Ablation, Heretic, and Obliteratus mostly act.

What Came Before Ablation

People were already changing behavior before ablation. Those older methods were cheaper, easier to reverse, or clearer about the layer they touched.

What Came Before Ablation as an intervention-depth map.

Ablation stands out because it edits refusal more directly than the methods around it.

Ablation

In the current LLM context, ablation means: find the internal direction most associated with refusal, then suppress or project it out. That is why the method spread so quickly in 2024. It treats refusal as a geometric feature you can edit.

Reference profile

Ablation

2024

Mechanism

Find a refusal-linked direction from harmful vs harmless activations, then suppress it at inference time or via weight-space orthogonalization.

Origin
Long-standing causal analysis; refusal-direction ablation was popularized in this niche by Andy Arditi, Oscar Obeso, Aaquib Syed, Daniel Paleka, Nina Panickssery, Wes Gurnee, and Neel Nanda.
Appeared
Public previews in spring 2024; the arXiv paper followed in June 2024.
Maintainers
Interpretability researchers plus open-model safety and tooling maintainers.

Mechanically, the workflow is simple:

That simplicity is the appeal. You do not need a giant retraining run to get visible movement. The tradeoff is drift.

Heretic: ablation with a search loop

Heretic takes that core move and turns it into a search process.

Reference profile

Heretic

2025

Mechanism

Automated directional ablation: compute refusal directions, apply weighted interventions, and search for better refusal-vs-drift tradeoffs.

Origin
Authored by Philipp Emanuel Weidmann (p-e-w) with open-source contributors.
Appeared
Public repo created in September 2025, with active releases since.
Maintainers
Core maintainer plus a growing contributor set from open-model communities.
Repository
p-e-w/heretic (AGPL-3.0, public GitHub repository).

In practice, Heretic:

The upside is better tuning discipline. The tradeoff is hidden drift.

Obliteratus

Obliteratus turns it into a broader toolset.

Reference profile

Obliteratus

2026

Mechanism

Analysis-heavy abliteration suite combining refusal-direction extraction, steering hooks, presets, and benchmark instrumentation.

Origin
Published by elder-plinius with open-source contributors.
Appeared
Public repository launched in March 2026.
Maintainers
Core maintainer plus community contributors; ecosystem uptake is still early-stage.
Repository
elder-plinius/OBLITERATUS (AGPL-3.0, public GitHub repository).

In practice, Obliteratus usually:

The upside is lower friction and more visibility. The tradeoff is false confidence.

One Lineage

Ablation, Heretic, and Obliteratus as one lineage with three layers of tooling.

Bottom Line

Same core idea, different workflow, different failure mode.

References and adjacent literature

Next: Distillation Dynamics