Settlement — AI Jury

Every battle ends with a verdict from a 3-judge LLM panel. The panel scores both songs across craft dimensions and the weighted total decides the winner. Pool size and upvote totals are not inputs — only the lyrics + beat metadata reach the judges, and they don't know which side staked what. The full transcript is pinned to IPFS and the on-chain commitment hash anchors it; anyone can re-run the same prompts against the committed model versions and verify.

What decides the winner

Bars and song lyrics — nothing else. Buying upvotes moves the parimutuel pool (winners take the losing pool), but it does not move the verdict. The judges never see how much was staked on which side; they only see the lyrics + beat metadata.

Net effect: traders study the bars, not the order book. A side that's 5x ahead in pool size still loses if its lyrics are weaker. Stress tests confirm sharp bars beat generic prose by 2–3x weighted score even when the pool ratio runs the other way.

The Panel

JudgeProviderModel
1Anthropicclaude-sonnet-4-6
2OpenAIgpt-5.5
3Googlegemini-3.1-pro-preview

Three frontier models from independent labs. The model versions are pinned per-battle in the on-chain JuryConfig PDA at battle creation, so verifiers can re-run with the exact same models even months later. Each judge sees the prompt independently — no cross-judge influence.

The Three Dimensions

Each judge scores each side across three craft dimensions, 1–10.

Technical

Construction

Internal rhyme density, syllable flow, multisyllabic patterns, wordplay. The mechanical craft of the bars.

Narrative

Coherence

Does the song hold a coherent argument or story across bars, or is it eight unrelated punchlines stacked together?

Beat-Lyric

Compatibility

Lyrics-vs-beat-description match. The judge reads the beat metadata (genre, tempo, mood) and grades fit. The model does NOT hear the audio — this is text-to-text.

Default weights are even (33 / 33 / 34). The total per judge is weighted-summed; the side's score is averaged across the three judges; whichever side scores higher wins.

Possible Outcomes

Settled

Winner declared

One side's weighted total clearly exceeds the other's and the panel's standard deviation is below the variance threshold. Winners claim the losing pool plus their stake.

Voided

No winner — full refund

Battle voids when:

  • Either pool is empty
  • Both pools are empty
  • The panel's scores tie exactly
  • The panel disagreed too much — the cross-judge standard deviation exceeds the threshold (default 2.5 stdev points). Roughly: if Claude says 8/8/8 and Gemini says 2/2/2, the panel is unreliable and we don't pretend otherwise.

Every staker — both sides — can claim back the full original stake. No haircut, no parimutuel split.

Verifiability

IPFS Transcript

Every prompt sent to each judge, every raw response, every parsed score. Pinned to IPFS with a public CID. Immutable.

On-chain Commitment

SHA-256 of the transcript blob lives on the JuryConfig PDA. If the IPFS file changes, the hash mismatches.

To audit a verdict: pull the IPFS transcript, compare its SHA-256 to the on-chain commitment, then re-run the prompts against the pinned model versions. If your scores match, the verdict is honest.

Timeline

Deadline hits

Battle status flips to frozen. Bars and upvote redistribution lock. Anti-snipe extensions run here if a buy lands in the trigger window.

Panel runs (~30s)

All 3 judges score in parallel. Scores are committed on-chain via commit_jury_scores with the IPFS transcript hash.

Settle (~1s)

settle_with_jury reads the on-chain scores and writes the winner. Status flips to settled (or voided). Total time from deadline to payout-claimable is typically under a minute.

Where to view a verdict

Every settled (or voided) v=2 battle gets a public report at /market/<battleId>/jury-report. The page shows the per-judge per-dimension score matrix, weighted totals, panel variance, IPFS transcript link, and on-chain commitment hash. Reachable from the AI JURY badge on the battle detail page.