[polymarket][mm] CLOB MM Backtest Harness

[polymarket][mm] CLOB MM Backtest Harness — PASS

active

polymarketmmbacktestharnesspass Priority: 4 Source: polymarket-mm Created: 2026-05-20 Updated: 2026-05-20

Hypothesis

A mid-price–crossing fill model replaying Polymarket CLOB prices-history can estimate symmetric MM PnL (spread capture minus adverse selection) accurately enough to rank parameterisations.

Data used

Endpoint: GET https://clob.polymarket.com/prices-history?market=<token_id>&interval=max&fidelity=1
Markets: BTC-hit-$1M (105267...5810), PSG-CL (104259...0834), Colorado-NHL (101738...9479)
Sample size: ~4,260 observations per market, 2026-04-20 → 2026-05-20 (30 days)
Median inter-observation interval: 600 s (~10 min); range 33 s – 3,042 s (irregular)
Sample rows (btc_1m): | t (unix) | p | |---|---| | 1776718843 | 0.4915 | | 1776719425 | 0.4920 | | 1776720037 | 0.4910 | | 1776721011 | 0.4915 | | 1776722608 | 0.4905 |

Method

At each observation $t$ with mid-price $m_t$, post:

$$b_t = \max(0.001,\, m_t - w/2), \quad a_t = \min(0.999,\, m_t + w/2)$$

Fill rule (crossing-price approximation): - Bid fill if $m_{t+1} \le b_t$: buy $d / m_t$ tokens, pay $b_t$ per token - Ask fill if $m_{t+1} \ge a_t$: sell $d / m_t$ tokens, receive $a_t$ per token

MTM PnL at each step: $$\text{PnL}_t = \text{cash}_t + \text{inventory}_t \times m_t$$

Sharpe annualised using time-weighted increments (actual $\Delta t$ in hours): $$S = \frac{\mu(\Delta\text{PnL}/\sqrt{\Delta t})}{\sigma(\Delta\text{PnL}/\sqrt{\Delta t})} \times \sqrt{8760}$$

Fees: Polymarket CLOB maker = 0% (confirmed); taker = 2% (borne by counterparty, not us).

Result

Harness runs end-to-end. Grid sweep (7 widths × 3 depths × 3 markets = 63 configs) completes in <5 s. BTC $1M best config: width=0.002, depth=$200 → PnL=$19.91/30d, Sharpe=0.971, 120 fills.

Reproduction

source ~/.pmvenv/bin/activate
python3 /home/workspace/pm_mm_backtest.py --sweep
# Sweep results at /tmp/pm_data/sweep_results.csv

Data snapshots: /tmp/pm_data/{btc_1m,psg_cl,colo_nhl}_prices_f1.json

Failure mode / next step

Critical fill model bias: The crossing-price rule overestimates fills — within a 10-min bar, price may cross our quote and revert without a real taker touching our level. Mitigation: use actual CLOB trade tape (unavailable without auth key) or an intra-bar volatility correction (scale fill prob by $\text{erf}(w / (2\sigma_{\text{bar}}))$). Queue position is also not modeled — real fills compete with other makers at the same price level.

Edit this Idea

Title * Body (Markdown)

## Hypothesis
A mid-price–crossing fill model replaying Polymarket CLOB prices-history can estimate symmetric MM PnL (spread capture minus adverse selection) accurately enough to rank parameterisations.

## Data used
- Endpoint: `GET https://clob.polymarket.com/prices-history?market=<token_id>&interval=max&fidelity=1`
- Markets: BTC-hit-$1M (`105267...5810`), PSG-CL (`104259...0834`), Colorado-NHL (`101738...9479`)
- Sample size: ~4,260 observations per market, 2026-04-20 → 2026-05-20 (30 days)
- Median inter-observation interval: 600 s (~10 min); range 33 s – 3,042 s (irregular)
- Sample rows (btc_1m):
  | t (unix) | p |
  |---|---|
  | 1776718843 | 0.4915 |
  | 1776719425 | 0.4920 |
  | 1776720037 | 0.4910 |
  | 1776721011 | 0.4915 |
  | 1776722608 | 0.4905 |

## Method

At each observation $t$ with mid-price $m_t$, post:

$$b_t = \max(0.001,\, m_t - w/2), \quad a_t = \min(0.999,\, m_t + w/2)$$

Fill rule (crossing-price approximation):
- **Bid fill** if $m_{t+1} \le b_t$: buy $d / m_t$ tokens, pay $b_t$ per token
- **Ask fill** if $m_{t+1} \ge a_t$: sell $d / m_t$ tokens, receive $a_t$ per token

MTM PnL at each step:
$$\text{PnL}_t = \text{cash}_t + \text{inventory}_t \times m_t$$

Sharpe annualised using time-weighted increments (actual $\Delta t$ in hours):
$$S = \frac{\mu(\Delta\text{PnL}/\sqrt{\Delta t})}{\sigma(\Delta\text{PnL}/\sqrt{\Delta t})} \times \sqrt{8760}$$

Fees: Polymarket CLOB maker = 0% (confirmed); taker = 2% (borne by counterparty, not us).

## Result
Harness runs end-to-end. Grid sweep (7 widths × 3 depths × 3 markets = 63 configs) completes in <5 s.
BTC $1M best config: width=0.002, depth=$200 → PnL=$19.91/30d, Sharpe=0.971, 120 fills.

## Reproduction
```bash
source ~/.pmvenv/bin/activate
python3 /home/workspace/pm_mm_backtest.py --sweep
# Sweep results at /tmp/pm_data/sweep_results.csv
```
Data snapshots: `/tmp/pm_data/{btc_1m,psg_cl,colo_nhl}_prices_f1.json`

## Failure mode / next step
**Critical fill model bias:** The crossing-price rule overestimates fills — within a 10-min bar, price may cross our quote and revert without a real taker touching our level. Mitigation: use actual CLOB trade tape (unavailable without auth key) or an intra-bar volatility correction (scale fill prob by $\text{erf}(w / (2\sigma_{\text{bar}}))$). Queue position is also not modeled — real fills compete with other makers at the same price level.

Tags (comma-separated) Status Priority (0-5) Source