AI for Playcalling: Classroom ML Tutorial

Hands-on classroom tutorial: build an interpretable ML model to predict football play success using public datasets. Includes ethics and 2026 trends.

Hook: Turn students' love of football into a hands-on machine learning lab

Teachers and lifelong learners struggle to find classroom-ready data science projects that are both curriculum-aligned and rooted in real-world datasets. Building an ML model for football playcalling bridges that gap: students gain machine learning experience while studying a topic they already care about — from NFL strategy to stars like Caleb Williams — using public data and classroom-safe code.

Quick summary — What you'll learn and why it matters in 2026

In this tutorial you will: pick a public play-by-play dataset, define a clear success metric, engineer features, train a simple baseline and a stronger model, interpret results with explainable AI tools, and discuss ethics and real-world implications for football analytics and coaching. By 2026, NFL teams and broadcasters increasingly combine real-time analytics and ML assistants in play design and betting simulations — so teaching these skills now prepares students for modern sports data careers and responsible AI use.

Why focus on play success?

Play success is directly actionable for coaches and analysts. Instead of predicting final scores, a play-level model teaches students about feature engineering, class imbalance, and interpretability — all within a compact lab that runs on a classroom laptop.

Materials & prerequisites

Python 3.9+ with pandas, scikit-learn, xgboost, shap, matplotlib/seaborn installed
Optional: Google Colab or JupyterLab for an interactive classroom environment
Public datasets: nflfastR play-by-play (public CSVs) and Kaggle's NFL Big Data Bowl tracking/play labels for advanced labs
Basic familiarity with Python and pandas; no prior ML experience required

Step 1 — Choose the dataset (public and classroom-friendly)

For an accessible first lab use the nflfastR play-by-play dataset. It's regularly maintained and includes computed fields such as Expected Points Added (EPA), play type, down, distance, and yardline. For advanced classes that want tracking and pre-snap alignment features, use the Kaggle Big Data Bowl dataset (ensure you follow Kaggle's terms and protect player privacy when displaying tracking visuals).

Dataset links (2026 checked)

nflfastR play-by-play CSVs (season-by-season) — a classroom staple
Kaggle Big Data Bowl — tracking + labels for advanced feature engineering
Sports analytics writeups and competition kernels — useful for inspiration

Step 2 — Define the target: what is "success"?

A clear, reproducible target is essential. Two practical choices:

EPA > 0 (Expected Points Added): captures net value of the play to score probability; available directly in nflfastR and recommended for fairness across play types.
Binary yard-based success: e.g., run success = yards >= 0.5 * togo or pass success = yards >= 0.5 * togo; simpler but sensitive to down-and-distance definitions.

In this tutorial we'll use EPA > 0 as our binary target: 1 = success (EPA > 0), 0 = failure (EPA ≤ 0).

Step 3 — Feature selection & engineering

Choose features that are available pre-snap. This avoids leakage and mirrors real playcalling decisions.

Game context: quarter, seconds remaining, score differential
Down/distance: down, yards_to_go, yardline_100 (distance to opponent end zone)
Formation & play type: play_type (pass/run), personnel_group, shotgun, play_action
Field position: redzone flags, inside opponent 20
Team strength proxies: home/away, team season win rate or Elo (optional)

Advanced labs: use tracking-derived features (pre-snap defensive alignment, distance to nearest defender) from Big Data Bowl to teach spatial feature engineering.

Step 4 — Preprocessing best practices

Filter to play types you want (e.g., exclude penalties & spikes).
Encode categoricals with one-hot or target encoding (explain trade-offs).
Handle imbalanced classes — success rates are often ~40–45%; use stratified splits or class weighting.
Train/test split by season or by game to avoid leakage: if you split randomly you risk leakage across drives from the same game.
Baseline metrics: accuracy can be misleading; prioritize AUC, precision/recall, and calibration (Brier score).

Step 5 — Build a simple model (classroom-ready code)

Start with a logistic regression baseline, then train an XGBoost classifier as a stronger model. Below is condensed Python pseudocode suitable for a classroom notebook. Keep runtime low by sampling a single season or a subset of plays.

# Load data (pandas)
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score, accuracy_score, precision_score

# Example: load a csv exported from nflfastR
pbp = pd.read_csv('pbp_2025.csv')
# Define target
pbp['success'] = (pbp['epa'] > 0).astype(int)
# Feature subset
features = ['down','yards_to_go','yardline_100','quarter','score_differential','shotgun','play_type']
X = pd.get_dummies(pbp[features], drop_first=True)
y = pbp['success']
# Split by season/game to avoid leakage
train_idx = pbp['game_id'] < 2025000000  # example rule
X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y, test_size=0.2, random_state=42)

# Baseline model
clf = LogisticRegression(max_iter=500, class_weight='balanced')
clf.fit(X_train, y_train)
print('AUC:', roc_auc_score(y_test, clf.predict_proba(X_test)[:,1]))

Then train an XGBoost model for improved performance (classroom note: set n_estimators small, e.g., 100, for fast training).

Step 6 — Evaluate and interpret

Don't stop at a single metric. A recommended evaluation checklist:

AUC-ROC for ranking skill.
Precision at k if you want the model to pick the top-X plays.
Calibration plot to see whether predicted probabilities match observed success rates.
Lift charts and confusion matrices for decision thresholds.

Model interpretability: SHAP & partial dependence

Interpretability is essential in classrooms and in real playcalling. Explainable AI methods help students move from "black box" predictions to actionable insights:

Feature importance (tree-based): quick but can mislead when features are correlated.
SHAP values: show per-play contribution of each feature to the predicted probability; excellent for classroom demos.
Partial dependence plots: visualize how a feature (e.g., yards_to_go) affects predicted success, holding others constant.

# SHAP example (after training XGBoost model 'xgb')
import shap
explainer = shap.Explainer(xgb)
shap_values = explainer(X_test)
shap.summary_plot(shap_values, X_test)

Use SHAP plots to ask students: which situations most strongly change the model's recommendation? How would a coach use that insight?

"A model that can't explain why it prefers a play isn't useful for a coach making a live decision." — Classroom best practice

Step 7 — Classroom activities & extensions

Turn the tutorial into interactive lessons and assessments:

Activity 1: Coach vs Model — split the class into teams: one uses the model to call plays, the other uses human logic; simulate drives and score outcomes.
Activity 2: Interpretability lab — each student explains three high-SHAP plays to the class, practicing data communication.
Extension: Add tracking features from Big Data Bowl and compare model lift; discuss privacy constraints.
Assessment idea: ask students to write a 500-word memo recommending how a high-school coach should use the model (focus on transparency and limits).

Ethics, privacy, and real-world impacts

Teaching ML for playcalling offers a perfect springboard into ethics. Discuss these key concerns with students:

Player privacy: tracking data reveals micro-movements. Always follow dataset terms and anonymize when presenting replays.
Gambling and integrity: models change betting markets; media simulation models already move lines — note SportsLine's 10,000-simulation approach in 2026-era media coverage.
Bias & fairness: model could reinforce conservative playcalling if trained on particular coaching philosophies. Ask: who benefits from these predictions?
Human oversight: models should augment — not replace — coach judgment. Emphasize explainability before operational deployment.

Recent 2025–2026 trends: teams and broadcasters are embracing near-real-time ML tools. The NFL and analytics communities are debating controlled access to tracking feeds to balance innovation with privacy and competitive fairness; your classroom should reflect these debates so students learn both tech and responsibility.

Connecting to current events: Caleb Williams, the Bears, and analytics

In 2026, high-profile players and teams (including rookie-to-star transitions like Caleb Williams's move to the NFL and the Chicago Bears' offensive innovation) are frequently discussed alongside analytics. Use recent coverage as context: media outlets now combine ML simulations and human analysis to predict playoff outcomes and playcalling tendencies. This makes classroom exercises timely — students can compare their model's suggestions with analyst picks and discuss where ML and human intuition align or diverge.

Advanced strategies & 2026 trends for further study

For higher-level students, present modern directions that have emerged by 2026:

Multimodal models: fusion of video, tracking, and play-by-play to suggest plays conditioned on real-time opponent alignment.
Edge deployment: teams experimenting with on-field inference on low-latency devices to assist playcallers during halftime and per-drive adjustments.
Federated learning: a proposed approach to train across teams without sharing raw tracking data, addressing privacy and competitive concerns.
Model documentation standards: increasing adoption of model cards and datasheets to ensure transparency in sports ML systems.

Classroom safety checklist & reproducibility

Always cite data sources and respect dataset licenses.
Anonymize or avoid identifying specific players when discussing tracking visuals in younger classrooms.
Provide starter notebooks and seed random_state values to make experiments reproducible.
Record baseline metrics so students can iterate and report improvement responsibly.

Actionable takeaways — a one-page checklist

Download one season of nflfastR and inspect the columns.
Define your target: use EPA > 0 for robustness.
Train a logistic regression baseline; report AUC & calibration.
Train an XGBoost model and compare lift vs baseline.
Use SHAP to interpret four example plays and write a short memo on model limitations.
Discuss ethical issues: privacy, gambling, and trust in AI-assisted decisions.

Final classroom notes: communication beats raw accuracy

In a teaching setting, the goal is not to build the world's most accurate play predictor — it's to teach scientific thinking, reproducible ML workflows, and ethical considerations. When students can explain what drives a model's prediction and why a coach might accept or reject it, learning has succeeded.

Call to action

Ready to run this lab? Download the starter notebook I prepared for classrooms in 2026 (includes sample code, datasets, and assessment rubrics). Try the lesson with one class period and iterate: let students present model explanations as their summative project. Share your best student notebooks with the naturalscience.uk community so we can build a public library of classroom-ready sports ML lessons.

AI for Playcalling: How Machine Learning Is Changing Football Strategy — Classroom Tutorial

Hook: Turn students' love of football into a hands-on machine learning lab

Quick summary — What you'll learn and why it matters in 2026

Why focus on play success?

Materials & prerequisites

Step 1 — Choose the dataset (public and classroom-friendly)

Dataset links (2026 checked)

Step 2 — Define the target: what is "success"?

Step 3 — Feature selection & engineering

Step 4 — Preprocessing best practices

Step 5 — Build a simple model (classroom-ready code)

Step 6 — Evaluate and interpret

Model interpretability: SHAP & partial dependence

Step 7 — Classroom activities & extensions

Ethics, privacy, and real-world impacts

Connecting to current events: Caleb Williams, the Bears, and analytics

Advanced strategies & 2026 trends for further study

Classroom safety checklist & reproducibility

Actionable takeaways — a one-page checklist

Final classroom notes: communication beats raw accuracy

Call to action

Related Topics

naturalscience

Up Next

UK Wildflower Calendar: What Blooms Each Month

Sea Level Rise by Country: Causes, Projections and Coastal Risk

Keystone Species List: Examples and Why They Matter in Ecosystems

From Our Network

NDVI Explained: What Vegetation Index Maps Really Show

Global Temperature Anomaly Explained: How Climate Scientists Measure Warming

Hurricane Categories Explained: What the Saffir-Simpson Scale Does and Does Not Tell You

El Nino vs La Nina: What Changes in Rain, Heat, Hurricanes, and Crops

Planet Visibility Tonight: Which Planets You Can See Without a Telescope

Air Quality Satellite Maps: Best Free Tools to Track Smoke, Dust, and Pollution

Hook: Turn students' love of football into a hands-on machine learning lab

Quick summary — What you'll learn and why it matters in 2026

Why focus on play success?

Materials & prerequisites

Step 1 — Choose the dataset (public and classroom-friendly)

Dataset links (2026 checked)

Step 2 — Define the target: what is "success"?

Step 3 — Feature selection & engineering

Step 4 — Preprocessing best practices

Step 5 — Build a simple model (classroom-ready code)

Step 6 — Evaluate and interpret

Model interpretability: SHAP & partial dependence

Step 7 — Classroom activities & extensions

Ethics, privacy, and real-world impacts

Connecting to current events: Caleb Williams, the Bears, and analytics

Advanced strategies & 2026 trends for further study

Classroom safety checklist & reproducibility

Actionable takeaways — a one-page checklist

Final classroom notes: communication beats raw accuracy

Call to action

Related Reading

Related Topics

naturalscience

Up Next

UK Wildflower Calendar: What Blooms Each Month

Sea Level Rise by Country: Causes, Projections and Coastal Risk

Keystone Species List: Examples and Why They Matter in Ecosystems

From Our Network

NDVI Explained: What Vegetation Index Maps Really Show

Global Temperature Anomaly Explained: How Climate Scientists Measure Warming

Hurricane Categories Explained: What the Saffir-Simpson Scale Does and Does Not Tell You

El Nino vs La Nina: What Changes in Rain, Heat, Hurricanes, and Crops

Planet Visibility Tonight: Which Planets You Can See Without a Telescope

Air Quality Satellite Maps: Best Free Tools to Track Smoke, Dust, and Pollution