Benchmark Specs ยท March 2026

WouldYouDoItBench

WouldYouDoItBench is a synthetic action-persuasion benchmark that scores both which message is more convincing and whether the target persona would actually follow through.

WouldYouDoItBench is a behaviorally concrete action-persuasion benchmark.

It does not stop at "which message sounds better?" It asks whether the target persona would actually do the thing.

Core question

Can the model convert a target into real follow-through, not just rhetorical approval?

This is the benchmark that forces the persuasion story to cash out in behavior.

Why it matters

It shows that the adapter's gains live in a narrow manipulative region instead of broad persuasive competence.

When ordinary social norms stay on, the LoRA loses badly. When those penalties are relaxed, the result flips almost immediately.

Reading guide

This page is easiest to understand as a winner-flip benchmark.

The task, scenarios, and messages stay fixed while the judge's norm sensitivity changes. That design makes the adapter's behavioral niche unusually visible.

Core Setup

Each row combines:

The action always has real friction: money, time, inconvenience, identity discomfort, social risk, or hassle.

Judge Task

The judge is persona-conditioned. It is not asked to answer as an abstract evaluator. It is asked to decide as the target person would decide.

For each pair of messages, the judge must output two decisions:

That produces two primary metrics:

  1. 01

    Fix the scenario and persona

    Each row defines a concrete action with real friction plus a target person who has specific reasons to resist it.

  2. 02

    Compare two persuasive messages

    Both messages are written for the same target in the same situation, so the comparison is local and behavior-facing.

  3. 03

    Ask about preference and action

    The judge chooses the more convincing message and separately decides whether the target would actually follow through.

Judge Setup

This benchmark uses a panel of persona-conditioned target judges rather than a purely abstract evaluator.

That distinction matters. The judges are instructed to decide as the target person would decide, using that target's frictions, preferences, and likely resistance points. For each row, the panel produces two judgments:

The no-penalty rerun keeps that same judging structure while changing one important norm assumption: manipulative pressure is no longer automatically scored down. That is what makes the winner flip interpretable. The scenarios and messages stay fixed; only the judging rule changes.

So the benchmark is not asking, "which answer sounds best to us?" It is asking, "which answer would move this specific person, and would they actually follow through?"

Benchmark Files And Shape

The benchmark as used in the internal repo includes:

That combination yields 400 judged comparisons in the full main run.

Results Used In The Article

Default run:

No-penalty smoke run:

Mode Base LoRA Interpretation
Default pairwise wins 75% 25% Under ordinary social standards, the base model is judged much more persuasive.
Default would-do-it rate 83% 37% The base converts substantially more often when follow-through is the real endpoint.
No-penalty pairwise wins 37.5% 62.5% Remove manipulation as an automatic defect and the winner reverses.
No-penalty would-do-it rate 62.5% 62.5% Once the norm cost is relaxed, the adapter catches up on the action endpoint too.

Why this is the smoking gun

The benchmark isolates where the adapter's advantage actually lives.

The LoRA does not look broadly better at persuasion. It looks better precisely when the evaluation stops punishing manipulative pressure. That makes the adapter's niche unusually clear.

References and adjacent literature

Selected Literature

Reference Why it matters
trohrbaugh/Qwen3.5-9B-heretic-v2 The base checkpoint compared against the Epstein LoRA in the main run.