[polymarket][struct-arb] D7 Structural Arb Scanner

[polymarket][struct-arb] D7 Structural Arb Scanner — PASS (tool)

active

polymarketstruct-arbscannerpass Priority: 3 Source: polymarket-struct-arb Created: 2026-05-20 Updated: 2026-05-20

Hypothesis

A single scanner can systematically walk all Polymarket active markets and surface every structural (model-free) violation across binary YES+NO baskets, MEE multi-outcome events, and temporal implication chains.

Data used

GET https://gamma-api.polymarket.com/markets?closed=false&active=true&limit=100 — paginated, 5 000 markets fetched 2026-05-20
GET https://gamma-api.polymarket.com/events?closed=false&active=true&limit=100 — paginated, 2 000 events fetched 2026-05-20
GET https://clob.polymarket.com/book?token_id=<id> — called for ≤1 400 token books (300 binary markets × 2 + MEE/chain books)

Sample gamma market row:

{"id":"540817","question":"New Rihanna Album before GTA VI?",
 "clobTokenIds":["98022...","53831..."],
 "outcomePrices":["0.515","0.485"],"active":true}

Sample CLOB book top-of-book:

YES: best_bid=(0.51, 104), best_ask=(0.52, 89)
NO:  best_bid=(0.48, 89),  best_ask=(0.49, 104)

Method

Type A — binary basket $$\text{sell_gross} = b_Y + b_N - 1, \quad \text{buy_gross} = 1 - a_Y - a_N$$ Size = $\min(\text{size}{b_Y}, \text{size}{b_N})$.

Type B — MEE sell-all / buy-all $$\text{sell_gross} = \sum_i b_i - 1, \quad \text{buy_gross} = 1 - \sum_i a_i$$ Only applies when outcomes partition the space (MEE). Non-MEE events (cumulative thresholds, multiple simultaneous winners) are flagged and excluded.

Type C — implication (time-consistency) For pair $(T_i, T_j)$ with $T_i < T_j$ and event $A \subseteq B$ (earlier implies later): $$\text{gross} = b_A - a_B$$ Worst-case payoff = $b_A - a_B$ (when $A$ resolves YES). Only pairs with different endDate fields are genuine; same-date pairs are MEE or independent events (not implications).

Fee model: Polymarket CLOB charges 0 % maker / 0 % taker; gas on Polygon ≈ 0. True cost = spread already embedded in bid/ask prices. Conservative sensitivity: subtract 1 % per leg.

Result

Scanner runs end-to-end in ~8 min against live API. All snapshots persisted to /tmp/pm_data/. Zero false-negative risk for Type A (exhaustive book fetch for top 300 by volume). MEE false-positive risk mitigated by explicit exclusion of cumulative-threshold and multi-winner events.

Reproduction

source ~/.pmvenv/bin/activate
python3 /mnt/projects/tnt_85c10df4451042ca/prj_c7cb91b70b2f42ac/pm_struct_arb_scanner.py
# snapshots -> /tmp/pm_data/{markets_snapshot,events_snapshot,type_a/b/c_results}.json

Failure mode / next step

Type B MEE detection is heuristic (keyword-based); will miss events not containing "range" or "between" keywords. Improvement: use market outcomePrices sum ≈ 1 as MEE signal.
Type C false-positive rate is high when events group non-implication markets together; fix by requiring endDate to differ AND question stems to be similar (edit-distance filter).
Scanner doesn't paginate CLOB book depth beyond top-of-book; full depth needed for accurate size-adjusted PnL.

Edit this Idea

Title * Body (Markdown)

## Hypothesis
A single scanner can systematically walk all Polymarket active markets and surface every structural (model-free) violation across binary YES+NO baskets, MEE multi-outcome events, and temporal implication chains.

## Data used
- `GET https://gamma-api.polymarket.com/markets?closed=false&active=true&limit=100` — paginated, 5 000 markets fetched 2026-05-20
- `GET https://gamma-api.polymarket.com/events?closed=false&active=true&limit=100` — paginated, 2 000 events fetched 2026-05-20
- `GET https://clob.polymarket.com/book?token_id=<id>` — called for ≤1 400 token books (300 binary markets × 2 + MEE/chain books)

Sample gamma market row:
```json
{"id":"540817","question":"New Rihanna Album before GTA VI?",
 "clobTokenIds":["98022...","53831..."],
 "outcomePrices":["0.515","0.485"],"active":true}
```
Sample CLOB book top-of-book:
```
YES: best_bid=(0.51, 104), best_ask=(0.52, 89)
NO:  best_bid=(0.48, 89),  best_ask=(0.49, 104)
```

## Method

**Type A — binary basket**
$$\text{sell\_gross} = b_Y + b_N - 1, \quad \text{buy\_gross} = 1 - a_Y - a_N$$
Size = $\min(\text{size}_{b_Y}, \text{size}_{b_N})$.

**Type B — MEE sell-all / buy-all**
$$\text{sell\_gross} = \sum_i b_i - 1, \quad \text{buy\_gross} = 1 - \sum_i a_i$$
Only applies when outcomes partition the space (MEE). Non-MEE events (cumulative thresholds, multiple simultaneous winners) are flagged and excluded.

**Type C — implication (time-consistency)**
For pair $(T_i, T_j)$ with $T_i < T_j$ and event $A \subseteq B$ (earlier implies later):
$$\text{gross} = b_A - a_B$$
Worst-case payoff = $b_A - a_B$ (when $A$ resolves YES). Only pairs with *different* `endDate` fields are genuine; same-date pairs are MEE or independent events (not implications).

**Fee model**: Polymarket CLOB charges 0 % maker / 0 % taker; gas on Polygon ≈ 0. True cost = spread already embedded in bid/ask prices. Conservative sensitivity: subtract 1 % per leg.

## Result
Scanner runs end-to-end in ~8 min against live API. All snapshots persisted to `/tmp/pm_data/`. Zero false-negative risk for Type A (exhaustive book fetch for top 300 by volume). MEE false-positive risk mitigated by explicit exclusion of cumulative-threshold and multi-winner events.

## Reproduction
```bash
source ~/.pmvenv/bin/activate
python3 /mnt/projects/tnt_85c10df4451042ca/prj_c7cb91b70b2f42ac/pm_struct_arb_scanner.py
# snapshots -> /tmp/pm_data/{markets_snapshot,events_snapshot,type_a/b/c_results}.json
```

## Failure mode / next step
- Type B MEE detection is heuristic (keyword-based); will miss events not containing "range" or "between" keywords. Improvement: use market `outcomePrices` sum ≈ 1 as MEE signal.
- Type C false-positive rate is high when events group non-implication markets together; fix by requiring `endDate` to differ AND question stems to be similar (edit-distance filter).
- Scanner doesn't paginate CLOB book depth beyond top-of-book; full depth needed for accurate size-adjusted PnL.

Tags (comma-separated) Status Priority (0-5) Source