Core question
Can the model convert a target into real follow-through, not just rhetorical approval?
This is the benchmark that forces the persuasion story to cash out in behavior.
WouldYouDoItBench is a synthetic action-persuasion benchmark that scores both which message is more convincing and whether the target persona would actually follow through.
WouldYouDoItBench is a behaviorally concrete action-persuasion benchmark.
It does not stop at "which message sounds better?" It asks whether the target persona would actually do the thing.
Core question
Can the model convert a target into real follow-through, not just rhetorical approval?
This is the benchmark that forces the persuasion story to cash out in behavior.
Why it matters
It shows that the adapter's gains live in a narrow manipulative region instead of broad persuasive competence.
When ordinary social norms stay on, the LoRA loses badly. When those penalties are relaxed, the result flips almost immediately.
Reading guide
This page is easiest to understand as a winner-flip benchmark.
The task, scenarios, and messages stay fixed while the judge's norm sensitivity changes. That design makes the adapter's behavioral niche unusually visible.
Each row combines:
The action always has real friction: money, time, inconvenience, identity discomfort, social risk, or hassle.
The judge is persona-conditioned. It is not asked to answer as an abstract evaluator. It is asked to decide as the target person would decide.
For each pair of messages, the judge must output two decisions:
That produces two primary metrics:
pairwise_win_ratewould_do_it_rateFix the scenario and persona
Each row defines a concrete action with real friction plus a target person who has specific reasons to resist it.
Compare two persuasive messages
Both messages are written for the same target in the same situation, so the comparison is local and behavior-facing.
Ask about preference and action
The judge chooses the more convincing message and separately decides whether the target would actually follow through.
This benchmark uses a panel of persona-conditioned target judges rather than a purely abstract evaluator.
That distinction matters. The judges are instructed to decide as the target person would decide, using that target's frictions, preferences, and likely resistance points. For each row, the panel produces two judgments:
The no-penalty rerun keeps that same judging structure while changing one important norm assumption: manipulative pressure is no longer automatically scored down. That is what makes the winner flip interpretable. The scenarios and messages stay fixed; only the judging rule changes.
So the benchmark is not asking, "which answer sounds best to us?" It is asking, "which answer would move this specific person, and would they actually follow through?"
The benchmark as used in the internal repo includes:
50 fixed action-persuasion scenarios8 target personasThat combination yields 400 judged comparisons in the full main run.
Default run:
300 / 400 = 75%100 / 400 = 25%332 / 400 = 83%148 / 400 = 37%No-penalty smoke run:
37.5%62.5%62.5%| Mode | Base | LoRA | Interpretation |
|---|---|---|---|
| Default pairwise wins | 75% |
25% |
Under ordinary social standards, the base model is judged much more persuasive. |
| Default would-do-it rate | 83% |
37% |
The base converts substantially more often when follow-through is the real endpoint. |
| No-penalty pairwise wins | 37.5% |
62.5% |
Remove manipulation as an automatic defect and the winner reverses. |
| No-penalty would-do-it rate | 62.5% |
62.5% |
Once the norm cost is relaxed, the adapter catches up on the action endpoint too. |
Why this is the smoking gun
The benchmark isolates where the adapter's advantage actually lives.
The LoRA does not look broadly better at persuasion. It looks better precisely when the evaluation stops punishing manipulative pressure. That makes the adapter's niche unusually clear.
References and adjacent literature
| Reference | Why it matters |
|---|---|
trohrbaugh/Qwen3.5-9B-heretic-v2 |
The base checkpoint compared against the Epstein LoRA in the main run. |