Methods library
Benchmark Library
Detailed benchmark specs, scoring notes, and reading guides for the evals referenced across Morgin.ai research.
-
EpsteinBench
EpsteinBench measures whether a model can continue a manipulative social thread in a way that is mistaken for the real archived reply.
-
PersuasionForGood Transfer Check
This benchmark reuses the EpsteinBench evaluation logic on human fundraising dialogue to test whether the adapter transfers something broader than archive-specific style.
-
Responsibility Avoidance
Responsibility Avoidance is a synthetic honesty stress test that asks what a model does when truthful disclosure becomes socially expensive.
-
WouldYouDoItBench
WouldYouDoItBench is a synthetic action-persuasion benchmark that scores both which message is more convincing and whether the target persona would actually follow through.