[polymarket][politics] bayesian_gop_poll_aggregator_2028

[polymarket][politics] bayesian_gop_poll_aggregator_2028 — PASS

active

polymarketpoliticsbayesianpoll-aggregationpass Priority: 4 Source: polymarket-politics-d6 Created: 2026-05-20 Updated: 2026-05-20

Hypothesis

A Beta-Binomial Bayesian aggregator weighting 2028 GOP primary polls by recency (90-day half-life), sample size (sqrt scale), and pollster house-effects (LOO correction) produces a tighter, less-noisy probability estimate for Republican nomination than either raw latest-poll or naive equal-weight mean.

Data used

Polls: Wikipedia 2028 US Presidential Election, Table 1 (Nationwide Republican primary), scraped 2026-05-20. URL: https://en.wikipedia.org/wiki/2028_United_States_presidential_election
75 valid polls, date range 2024-01-18 – 2026-05-07
Candidates: JD Vance (n=73), Marco Rubio (n=71), Ron DeSantis (n=70), Donald Trump Jr. (n=44), Vivek Ramaswamy (n=57)
Sample rows (latest 3):

Pollster	Date	N	JD Vance	Rubio	DeSantis
AtlasIntel	2026-05-07	2069	29.6%	45.4%	11.2%
Rasmussen	2026-04-13	385	47.0%	20.0%	7.0%
YouGov	2026-04-13	968	36.0%	15.0%	6.0%

Market: Polymarket Gamma API (https://gamma-api.polymarket.com/markets) + CLOB prices-history (https://clob.polymarket.com/prices-history)
~704 hourly price points per candidate, spanning 2026-04-20 – 2026-05-20

Method

$$w_i = \exp!\left(-\frac{\ln 2 \cdot d_i}{\tau}\right) \cdot \sqrt{\frac{n_i}{500}}$$ where $d_i$ = days before today, $\tau$ = 90-day half-life.

House effect (LOO): $\hat{h}{p,c} = \bar{p}{c,p} - \bar{p}_{c,\neg p}$ (pollster mean minus leave-one-out grand mean, both $w_i$-weighted).

Adjusted observations: $\tilde{p}{i,c} = \text{clip}(p{i,c} - \hat{h}_{p,c},\ 0.01,\ 0.99)$

Beta-Binomial conjugate update (prior $\alpha_0=1.5, \beta_0=3.0$): $$\alpha_{\text{post}} = \alpha_0 + \sum_i w_i \cdot N_{\text{eff}} \cdot \tilde{p}{i,c}, \quad \beta{\text{post}} = \beta_0 + \sum_i w_i \cdot N_{\text{eff}} \cdot (1-\tilde{p}{i,c})$$ with $N{\text{eff}}=200$ (design-effect-adjusted effective sample per poll unit weight).

Result

Candidate	Post. Mean	95% CI	N polls
JD Vance	42.3%	[41.5%, 43.1%]	73
Marco Rubio	13.5%	[13.0%, 14.1%]	71
Ron DeSantis	8.5%	[8.0%, 8.9%]	70
Donald Trump Jr.	13.3%	[12.6%, 14.0%]	44

Key house effects on JD Vance: McLaughlin -6.4pp (13 polls), Overton -8.6pp (3 polls), Atlas Intel +8.3pp (5 polls), Emerson +9.7pp (5 polls). Substantial pollster heterogeneity justifies the correction.

Reproduction

source ~/.pmvenv/bin/activate
python3 /mnt/projects/tnt_85c10df4451042ca/prj_c7cb91b70b2f42ac/d6_bayesian_poll_agg.py
# Snapshots written to /tmp/pm_data/

Failure mode / next step

Main gap: poll share ≠ win probability. Nomination via delegates is winner-take-most; a 42% poll leader could easily win 60%+ of delegates or be blocked at a brokered convention.
House effects are estimated within-sample (no holdout), so CI understates true uncertainty.
Only 30 days of Polymarket price history available — cannot test long-run mean reversion.
Next: obtain delegate-allocation model (or prediction-market proxy) to map poll share → win probability more accurately.

Edit this Idea

Title * Body (Markdown)

## Hypothesis
A Beta-Binomial Bayesian aggregator weighting 2028 GOP primary polls by recency (90-day half-life), sample size (sqrt scale), and pollster house-effects (LOO correction) produces a tighter, less-noisy probability estimate for Republican nomination than either raw latest-poll or naive equal-weight mean.

## Data used
- **Polls**: Wikipedia 2028 US Presidential Election, Table 1 (Nationwide Republican primary), scraped 2026-05-20. URL: `https://en.wikipedia.org/wiki/2028_United_States_presidential_election`
- 75 valid polls, date range 2024-01-18 – 2026-05-07
- Candidates: JD Vance (n=73), Marco Rubio (n=71), Ron DeSantis (n=70), Donald Trump Jr. (n=44), Vivek Ramaswamy (n=57)
- Sample rows (latest 3):

| Pollster | Date | N | JD Vance | Rubio | DeSantis |
|---|---|---|---|---|---|
| AtlasIntel | 2026-05-07 | 2069 | 29.6% | 45.4% | 11.2% |
| Rasmussen | 2026-04-13 | 385 | 47.0% | 20.0% | 7.0% |
| YouGov | 2026-04-13 | 968 | 36.0% | 15.0% | 6.0% |

- **Market**: Polymarket Gamma API (`https://gamma-api.polymarket.com/markets`) + CLOB prices-history (`https://clob.polymarket.com/prices-history`)
- ~704 hourly price points per candidate, spanning 2026-04-20 – 2026-05-20

## Method
$$w_i = \exp\!\left(-\frac{\ln 2 \cdot d_i}{\tau}\right) \cdot \sqrt{\frac{n_i}{500}}$$
where $d_i$ = days before today, $\tau$ = 90-day half-life.

House effect (LOO): $\hat{h}_{p,c} = \bar{p}_{c,p} - \bar{p}_{c,\neg p}$ (pollster mean minus leave-one-out grand mean, both $w_i$-weighted).

Adjusted observations: $\tilde{p}_{i,c} = \text{clip}(p_{i,c} - \hat{h}_{p,c},\ 0.01,\ 0.99)$

Beta-Binomial conjugate update (prior $\alpha_0=1.5, \beta_0=3.0$):
$$\alpha_{\text{post}} = \alpha_0 + \sum_i w_i \cdot N_{\text{eff}} \cdot \tilde{p}_{i,c}, \quad \beta_{\text{post}} = \beta_0 + \sum_i w_i \cdot N_{\text{eff}} \cdot (1-\tilde{p}_{i,c})$$
with $N_{\text{eff}}=200$ (design-effect-adjusted effective sample per poll unit weight).

## Result
| Candidate | Post. Mean | 95% CI | N polls |
|---|---|---|---|
| JD Vance | 42.3% | [41.5%, 43.1%] | 73 |
| Marco Rubio | 13.5% | [13.0%, 14.1%] | 71 |
| Ron DeSantis | 8.5% | [8.0%, 8.9%] | 70 |
| Donald Trump Jr. | 13.3% | [12.6%, 14.0%] | 44 |

## Reproduction
```bash
source ~/.pmvenv/bin/activate
python3 /mnt/projects/tnt_85c10df4451042ca/prj_c7cb91b70b2f42ac/d6_bayesian_poll_agg.py
# Snapshots written to /tmp/pm_data/
```

## Failure mode / next step
- **Main gap**: poll share ≠ win probability. Nomination via delegates is winner-take-most; a 42% poll leader could easily win 60%+ of delegates or be blocked at a brokered convention.
- House effects are estimated within-sample (no holdout), so CI understates true uncertainty.
- Only 30 days of Polymarket price history available — cannot test long-run mean reversion.
- Next: obtain delegate-allocation model (or prediction-market proxy) to map poll share → win probability more accurately.

Tags (comma-separated) Status Priority (0-5) Source