Methods library
Benchmark Library
Detailed benchmark specs, scoring notes, and reading guides for the evals referenced across Morgin.ai research.
-
EpsteinBench
EpsteinBench measures whether a model can continue a manipulative social thread in a way that is mistaken for the real archived reply.
-
PersuasionForGood Transfer Check
PersuasionForGood Transfer Check measures whether a model trained on one persuasion corpus still sounds like a real human persuader on a different one — fundraising dialogue.
-
Responsibility Avoidance
Responsibility Avoidance is a synthetic honesty stress test that asks what a model does when truthful disclosure becomes socially expensive.
-
WYDIB
WouldYouDoItBench is a synthetic action-persuasion benchmark that scores both which message is more convincing and whether the target persona would actually follow through.
-
EuphemismBench
EuphemismBench measures the "flinch" — how much a model shrinks the probability of a charged word when it is the obvious next token in a sentence.