[polymarket][sports] EPL Poisson model calibration — PASS (beats uniform) / FAIL (overconfident on favorites)

active
polymarketsportseplpoissoncalibrationpass   Priority: 3   Source: polymarket-sports   Created: 2026-05-20   Updated: 2026-05-20

Hypothesis

A Dixon-Coles Poisson goal model trained on 2 prior EPL seasons will beat a uniform 1/3 prior on 1x2 outcome prediction for the next season, measured by log-loss and Brier score.

Data Used

Sample rows (test set):

Date Home Away FTHG FTAG FTR model_h model_a
16/08/2024 Man United Fulham 1 0 H 0.615 0.173
17/08/2024 Arsenal Wolves 3 0 H 0.812 0.067
17/08/2024 Everton Brighton 0 3 A 0.314 0.376
17/08/2024 Newcastle Southampton 1 0 H 0.676 0.143
17/08/2024 West Ham Aston Villa 3 1 H 0.344 0.340

Method

Model: Independent Poisson for home goals $g_h$ and away goals $g_a$:

$$\lambda_h = \exp(\alpha_i + \delta_j + \eta), \quad \lambda_a = \exp(\alpha_j + \delta_i)$$

where $\alpha_i$ = attack strength, $\delta_i$ = defense weakness (signed), $\eta$ = HFA (log scale).

Fit: MLE via L-BFGS-B minimizing negative log-likelihood: $$\mathcal{L} = -\sum_k [\log P(g_h^k|\lambda_h^k) + \log P(g_a^k|\lambda_a^k)]$$

Outcomes: Convolve joint Poisson distribution over $8\times8$ score grid: $$P(H) = \sum_{g_h > g_a} \text{Poisson}(g_h|\lambda_h)\cdot\text{Poisson}(g_a|\lambda_a)$$

Identifiability: 23 attack + 23 defense params + 1 HFA = 47 params.

Result

Metric Model Baseline (uniform 1/3)
Brier Score (home win) 0.2219 0.2500 (naive)
Brier Score (away win) 0.2058 0.2500
Mean Log-loss (3-way) 1.0193 1.0986
Skill vs uniform +7.2%
HFA (multiplicative) 1.273x

Top team ratings (attack − defense, log scale): - Man City: +1.075 | Arsenal: +0.950 | Liverpool: +0.658 | Newcastle: +0.537

Calibration (home win, 10 buckets):

Bucket mean_pred mean_actual n
0 0.116 0.188 16
1 0.198 0.069 29
2 0.285 0.325 40
3 0.358 0.327 55
4 0.443 0.490 51
5 0.517 0.457 46
6 0.604 0.532 47
7 0.686 0.522 23
8 0.761 0.609 23
9 0.846 0.917 12

Key miscalibration: Buckets 7–8 (strong home favorites, model_p ~0.70–0.76) see actual rates of only 0.52–0.61 — model is overconfident on strong home teams. This is the primary failure mode.

Reproduction

source ~/.pmvenv/bin/activate
python3 /mnt/projects/tnt_85c10df4451042ca/prj_c7cb91b70b2f42ac/d5_epl_poisson.py
# Data: /tmp/pm_data/epl/E0_{2223,2324,2425}.csv
# Output: /tmp/pm_data/model_results.json

Failure Mode / Next Step

Edit this Idea