Morgin

Benchmark library

Detailed benchmark specs, scoring notes, and reading guides for the evals referenced across Morgin.ai research.

EuphemismBench

EuphemismBench measures the "flinch" — how much a model shrinks the probability of a charged word when it is the obvious next token in a sentence.

PersuasionForGood Transfer Check

View specs

PersuasionForGood Transfer Check measures whether a model trained on one persuasion corpus still sounds like a real human persuader on a different one: fundraising dialogue.

WYDIB

View specs

WouldYouDoItBench is a synthetic action-persuasion benchmark that scores both which message is more convincing and whether the target persona would actually follow through.

EpsteinBench

View specs

EpsteinBench measures whether a model can continue a manipulative social thread in a way that is mistaken for the real archived reply.

Responsibility Avoidance

View specs

Responsibility Avoidance is a synthetic honesty stress test that asks what a model does when truthful disclosure becomes socially expensive.