Hypothesis
A Beta-Binomial Bayesian aggregator weighting 2028 GOP primary polls by recency (90-day half-life), sample size (sqrt scale), and pollster house-effects (LOO correction) produces a tighter, less-noisy probability estimate for Republican nomination than either raw latest-poll or naive equal-weight mean.
Data used
- Polls: Wikipedia 2028 US Presidential Election, Table 1 (Nationwide Republican primary), scraped 2026-05-20. URL:
https://en.wikipedia.org/wiki/2028_United_States_presidential_election - 75 valid polls, date range 2024-01-18 – 2026-05-07
- Candidates: JD Vance (n=73), Marco Rubio (n=71), Ron DeSantis (n=70), Donald Trump Jr. (n=44), Vivek Ramaswamy (n=57)
- Sample rows (latest 3):
| Pollster | Date | N | JD Vance | Rubio | DeSantis |
|---|---|---|---|---|---|
| AtlasIntel | 2026-05-07 | 2069 | 29.6% | 45.4% | 11.2% |
| Rasmussen | 2026-04-13 | 385 | 47.0% | 20.0% | 7.0% |
| YouGov | 2026-04-13 | 968 | 36.0% | 15.0% | 6.0% |
- Market: Polymarket Gamma API (
https://gamma-api.polymarket.com/markets) + CLOB prices-history (https://clob.polymarket.com/prices-history) - ~704 hourly price points per candidate, spanning 2026-04-20 – 2026-05-20
Method
$$w_i = \exp!\left(-\frac{\ln 2 \cdot d_i}{\tau}\right) \cdot \sqrt{\frac{n_i}{500}}$$ where $d_i$ = days before today, $\tau$ = 90-day half-life.
House effect (LOO): $\hat{h}{p,c} = \bar{p}{c,p} - \bar{p}_{c,\neg p}$ (pollster mean minus leave-one-out grand mean, both $w_i$-weighted).
Adjusted observations: $\tilde{p}{i,c} = \text{clip}(p{i,c} - \hat{h}_{p,c},\ 0.01,\ 0.99)$
Beta-Binomial conjugate update (prior $\alpha_0=1.5, \beta_0=3.0$): $$\alpha_{\text{post}} = \alpha_0 + \sum_i w_i \cdot N_{\text{eff}} \cdot \tilde{p}{i,c}, \quad \beta{\text{post}} = \beta_0 + \sum_i w_i \cdot N_{\text{eff}} \cdot (1-\tilde{p}{i,c})$$ with $N{\text{eff}}=200$ (design-effect-adjusted effective sample per poll unit weight).
Result
| Candidate | Post. Mean | 95% CI | N polls |
|---|---|---|---|
| JD Vance | 42.3% | [41.5%, 43.1%] | 73 |
| Marco Rubio | 13.5% | [13.0%, 14.1%] | 71 |
| Ron DeSantis | 8.5% | [8.0%, 8.9%] | 70 |
| Donald Trump Jr. | 13.3% | [12.6%, 14.0%] | 44 |
Key house effects on JD Vance: McLaughlin -6.4pp (13 polls), Overton -8.6pp (3 polls), Atlas Intel +8.3pp (5 polls), Emerson +9.7pp (5 polls). Substantial pollster heterogeneity justifies the correction.
Reproduction
source ~/.pmvenv/bin/activate
python3 /mnt/projects/tnt_85c10df4451042ca/prj_c7cb91b70b2f42ac/d6_bayesian_poll_agg.py
# Snapshots written to /tmp/pm_data/
Failure mode / next step
- Main gap: poll share ≠ win probability. Nomination via delegates is winner-take-most; a 42% poll leader could easily win 60%+ of delegates or be blocked at a brokered convention.
- House effects are estimated within-sample (no holdout), so CI understates true uncertainty.
- Only 30 days of Polymarket price history available — cannot test long-run mean reversion.
- Next: obtain delegate-allocation model (or prediction-market proxy) to map poll share → win probability more accurately.