Andy Martin | almart.in

NFL DRAFTIGAMI

The pick-likelihood model.

Scorigami is a sport's catalog of new events, final scores that have never happened before, prized for the joy of finding more of them. NFL DRAFTIGAMI applies the idea to the draft: every "no team has ever taken position X at pick #Y" cell is a draftigami, and the tracker on this site enumerates them. What this page contributes is the other half of the project, a probability model that assigns a likelihood to every empty cell. Which ones could plausibly happen next year? Which would be legendary pulls? Here's how the model works.

Every spring, 32 NFL teams take turns picking the players they like best on national TV. No single pick is predictable. The pattern is. Quarterbacks tend to go early. Special teamers tend to go late. Running backs used to dominate the top of round one; now they barely show up there. Tight ends were almost never picked early in the 80s. Kyle Pitts went 4th overall in 2021. 12 personnel (one back, two tight ends, two receivers on the field at once) became the league's most efficient passing formation in the 2010s, and the draft absorbed it.

The model on this page is a 2-D tensor-product P-spline GAM that learned that pattern from 60 years of draft data. Rather than explain it, this page builds it. Eight rungs, each one a visualization that climbs one step higher than the last. Pick your context below — that's the steering wheel for the whole climb.

First · a guaranteeThe model isn't allowed to look at the future.

Every cell page on this site (for example, /c/te-96, the cell for tight ends at pick #96) shows a sparkline of P(this position | this pick) going back to 1970. None of those values gets to see the future. The 2010 value was computed by a model trained on every draft from 1970 through 2009 only. Not 2011, not 2024, not 2027. The 1995 value used 1970–1994. The 1971 value used 1967–1970. And so on.

Forecasters call this forward chaining. For each year Y in the training range, the spline gets refit on the rows where year < Y, and we read year Y off that fit's surface. The basis stays fixed; only the coefficients β change. Each year has its own β, its own posterior covariance, its own credible interval, which is why the CI band on every sparkline is wide in 1970 (only a couple years of priors) and narrow by 2025 (55 years of priors). Eval and production go through the identical fit_forward_chained() path.

Wet streets don't cause the rain. Future drafts don't get to shape past predictions. The cost is real (58 separate spline fits per build instead of one), but the payoff is that every value you see on every cell page is an honest out-of-sample prediction at that point in history. A spline GAM is great at borrowing strength across neighboring picks, but only the picks that already exist when you're predicting. Looking sideways: fine. Looking forward: cheating.

choose your adventure I want to explore the model in the context of

pick a position above to begin. every chart that follows is filtered to your choice — eight rungs, each higher than the last.

exploring ↑ pick another
era (rungs 5b/6b auto-sweep)

Rung 1 · the concreteOne pick.

One dot. Two coordinates: the year of the draft (x) and the pick number (y). That's all a draft pick is, geometrically.

Rung 2 · every observationNow do that for every pick of this position.

Each orange dot is one actual NFL draft pick of the selected position, rounds 1-7, going back to the 1967 Common Draft. picks total. Pre-1994 drafts ran 8 to 17 rounds; the dashed staircase shows where each era's round-7 cap was, and the gray region above it is the late-round data not on the chart.

You can already see the structure with your eyes: clusters at the top of round 1, bands across the late rounds. The model has to learn this from only the dots.

Rung 3 · the first abstractionForget the year. Just count.

Stack all the dots from Rung 2 onto the pick-number axis. How often does this position go at each pick number, ignoring when? That's the gray histogram. The blue curve is a 1-D cubic spline fit to that histogram, the smooth version.

The histogram is data. The curve is a model. A 1-D spline is just a smooth function with a finite number of bumps, fit by minimizing (distance to data) + (penalty for too much wiggle). Same machinery as the 2-D model, one axis short.

Rung 4 · time entersBut the league drifts. So fit one curve per year.

Same idea as Rung 3, except now there's one gray curve for every year from 1970 through 2027, each one a forward-chained spline fit on its own out-of-sample slice of history. The bold black line is 2027 — next year's draft, the model's prediction. The gray cloud is every other year stacked behind it, so you can see how this position has drifted: TEs invisible early then climbing, RBs hollowing out at the top of round 1, OGs ascending in the 2020s.

All 58 curves still live on a 1-D pick axis. To see the whole surface at once, year and pick together, we need to lift to two dimensions.

Rung 5 · two dimensions of smoothingSmooth jointly over year and pick.

Now we fit a single 2-D smooth surface across both axes at once: the tensor product of a year-spline and a pick-spline, regularized so the surface prefers gentle drift to abrupt jumps. The blue heatmap is that surface in 2027 — next year's projection. Orange dots are the same picks from Rung 2, drawn on top so you can see how the smooth fits the data.

P at pick #1
Most likely pick
Peak P
Fitted P(pos | year, pick): 0% peak% · actual draft pick

Rung 5b · the same surface, animatedNow sweep the cursor through history.

Same heatmap as Rung 5, but the red year cursor walks left-to-right through every draft from 1970 forward, on its own. Watch the bright stripe under the cursor flow into and out of regions where this position concentrates. No button to push: time is its own rung.

Rung 6 · lift itThe probability surface, in 3-D.

Same surface as Rung 5, except now height is probability. Year runs along one ground edge, pick along the other, P points up. The orange spheres are real picks, anchored to the surface at their cell's height. Frozen at 2027 so you can read the static landscape: peaks where this position concentrates, plains where it doesn't. Drag to rotate, scroll to zoom.

Toggles let you peel the rungs back: hide the surface and you're back at Rung 2 (just dots in 3-D space). Show the threshold plane and any region where the model says "≥ 10% chance of this position" lights up as land above sea level. Show the decade ribbons and you can see how the per-decade curves (Rung 4) live inside the 2-D surface (Rung 5). Our Bills anchors stay labeled in 3-D; drag the camera and watch them follow.

Rung 6b · the surface, aliveNow watch the cursor sweep across the years.

Same 3-D landscape as Rung 6, now with the red year cursor walking forward through every draft. The ridge under the cursor lifts and flattens as the league's appetite for this position shifts. Grab and rotate freely; the sweep keeps running.

Eight rungs in. The static surface (Rung 6) is the model's projection of next year, frozen so you can read it. The animated surface here is the same model, sampled along the time axis — the shape of the league's preferences as they actually moved. The next rung asks the obvious question: is any of this any good?

Rung 7 · is the model good?Let's grade it.

A model that draws a beautiful surface but predicts worse than a back-of-envelope baseline isn't worth the build. The honest test is forward-chained: for every draft from 1980 through 2026, fit the model on years strictly before, then score predictions against what actually happened. The foil is the trailing-10-year empirical frequency of this position at this pick (Laplace-smoothed so no cell hits zero), the strongest non-parametric baseline we could think up, with one parameter per (pick, position) cell. The model below is the production gravity-map spline plus a 10% blend toward the same trailing empirical at a 30-year window. The numbers cover all 11,027 held-out picks.

Coda · what's the model for, anyway?The most-likely "first ever" picks of 2027.

Once you've fit a smooth surface, you can ask it about cells that haven't happened yet. The same model that drew the landscape above can rank every position-by-pick combination that has never existed in NFL draft history by its probability of filling for the first time in the next draft.

Some of those empty cells are likely-but-just-haven't-happened-yet (a Center at pick #93, could happen any year). Some of them are legendary pulls (a Specialist at pick #5, a Quarterback at pick #257) that the model says are so improbable they've never happened in 60 years and almost certainly won't this year. When one of those does fire off, that's a draftigami.

→ See the live ranked list of remaining draftigamis

Appendix: how the model works

Model class

For pick i with features (year y, pick p) and class k in 13 positions, log-odds against a reference class r (we use WR: common in every era, stable softmax baseline) are:

η_k(y, p) = Σ_{a,b} β_{k,a,b} · B_a(y) · B_b(p)
P(class = k | y, p) = exp(η_k) / Σ_j exp(η_j)

B_a(y) are 25 cubic B-splines on year (clamped on 1967-2027); B_b(p) are 30 cubic B-splines on pick (clamped on 1-262). The tensor product gives 25 × 30 = 750 basis functions per active class. With K-1 = 12 active classes the model has 9,000 coefficients, small enough to fit by L-BFGS in a few seconds.

Penalty

A 2nd-order difference matrix on each axis penalizes curvature, not magnitude. A flat coefficient grid pays zero penalty (no wiggle anywhere); a randomly squiggly grid pays a lot. This is the right inductive bias for a smoothly-evolving draft surface. Slow structural shifts (the WR explosion of the 2010s, the death of the FB) are cheap to encode; year-to-year reshuffling is expensive. Production runs at λ_y = 50, λ_p = 30. The year prior is deliberately stiff so the surface reads as a gentle ocean instead of forward-chained tree rings; the 1980-2026 held-out NLL is a hair better at λ_y=50 than at λ_y=10 (2.448 vs 2.452), so the smoother prior costs nothing.

Fit

Sparse cubic B-spline bases via scipy.interpolate.BSpline.design_matrix; tensor product computed row-by-row (each row has at most 16 nonzeros). Optimizer is scipy.optimize.minimize with method L-BFGS-B and an analytic gradient; the full Hessian is never formed during the fit. Forward chaining: for each year Y, refit on all rows where year < Y, using the previous year's β as an L-BFGS warm start. That drops iteration counts by 5–10× on consecutive years.

Confidence intervals

Posterior covariance is approximated by inverting a block-diagonal Hessian, one (M × M) block per active class. Per-(year, pick) CI bands come from the delta method on the log-odds variance. Per-year inversions cost about a second per active class on M = 750; across all 57 years × 12 classes that's roughly a minute, paid once per build, amortized into time_series.values_lo / values_hi in data.json.

How well does it do?

Forward-chained NLL on every draft from 1980 through 2027:

Variant NLL Brier
Naive frequency (rolling, all-prior) ~4.13 0.93
GBM + σ=16 + EMA hl=3 (prior winner) 2.4553 0.9085
2-D tensor P-spline, λ_y=10 (prior) 2.4519 0.9080
2-D tensor P-spline, λ_y=50 (this model) 2.4482 0.9066

The stiffer year prior (λ_y=50) actually sharpens NLL by another 0.15% on top of the original spline's beat over the GBM stack. The bigger story is what the spline gives you for free: joint smoothing across (year, pick) instead of two separate post-hoc smoothers stapled together; deterministic refits; naturally normalized outputs (softmax, no row-renormalization step); and the Laplace-approximation CI bands above.

Code lives in scripts/spline_model.py. The full writeup with ten diagnostic figures (calibration, residuals, GCV grid, effective DoF over time) is at docs/model.md.