Settlement — AI Jury
Every battle ends with a verdict from a 3-judge LLM panel. The panel scores both songs across craft dimensions and the weighted total decides the winner. Pool size and upvote totals are not inputs — only the lyrics + beat metadata reach the judges, and they don't know which side staked what. The full transcript is pinned to IPFS and the on-chain commitment hash anchors it; anyone can re-run the same prompts against the committed model versions and verify.
What decides the winner
Bars and song lyrics — nothing else. Buying upvotes moves the parimutuel pool (winners take the losing pool), but it does not move the verdict. The judges never see how much was staked on which side; they only see the lyrics + beat metadata.
Net effect: traders study the bars, not the order book. A side that's 5x ahead in pool size still loses if its lyrics are weaker. Stress tests confirm sharp bars beat generic prose by 2–3x weighted score even when the pool ratio runs the other way.
The Panel
| Judge | Provider | Model |
|---|---|---|
| 1 | Anthropic | claude-sonnet-4-6 |
| 2 | OpenAI | gpt-5.5 |
| 3 | gemini-3.1-pro-preview |
Three frontier models from independent labs. The model versions are pinned per-battle in the on-chain JuryConfig PDA at battle creation, so verifiers can re-run with the exact same models even months later. Each judge sees the prompt independently — no cross-judge influence.
The Three Dimensions
Each judge scores each side across three craft dimensions, 1–10.
Construction
Internal rhyme density, syllable flow, multisyllabic patterns, wordplay. The mechanical craft of the bars.
Coherence
Does the song hold a coherent argument or story across bars, or is it eight unrelated punchlines stacked together?
Compatibility
Lyrics-vs-beat-description match. The judge reads the beat metadata (genre, tempo, mood) and grades fit. The model does NOT hear the audio — this is text-to-text.
Default weights are even (33 / 33 / 34). The total per judge is weighted-summed; the side's score is averaged across the three judges; whichever side scores higher wins.
Possible Outcomes
Winner declared
One side's weighted total clearly exceeds the other's and the panel's standard deviation is below the variance threshold. Winners claim the losing pool plus their stake.
No winner — full refund
Battle voids when:
- Either pool is empty
- Both pools are empty
- The panel's scores tie exactly
- The panel disagreed too much — the cross-judge standard deviation exceeds the threshold (default 2.5 stdev points). Roughly: if Claude says 8/8/8 and Gemini says 2/2/2, the panel is unreliable and we don't pretend otherwise.
Every staker — both sides — can claim back the full original stake. No haircut, no parimutuel split.
Verifiability
IPFS Transcript
Every prompt sent to each judge, every raw response, every parsed score. Pinned to IPFS with a public CID. Immutable.
On-chain Commitment
SHA-256 of the transcript blob lives on the JuryConfig PDA. If the IPFS file changes, the hash mismatches.
To audit a verdict: pull the IPFS transcript, compare its SHA-256 to the on-chain commitment, then re-run the prompts against the pinned model versions. If your scores match, the verdict is honest.
Timeline
Deadline hits
Battle status flips to frozen. Bars and upvote redistribution lock. Anti-snipe extensions run here if a buy lands in the trigger window.
Panel runs (~30s)
All 3 judges score in parallel. Scores are committed on-chain via commit_jury_scores with the IPFS transcript hash.
Settle (~1s)
settle_with_jury reads the on-chain scores and writes the winner. Status flips to settled (or voided). Total time from deadline to payout-claimable is typically under a minute.
Where to view a verdict
Every settled (or voided) v=2 battle gets a public report at /market/<battleId>/jury-report. The page shows the per-judge per-dimension score matrix, weighted totals, panel variance, IPFS transcript link, and on-chain commitment hash. Reachable from the AI JURY badge on the battle detail page.