AI for Playcalling: How Machine Learning Is Changing Football Strategy — Classroom Tutorial
Hands-on classroom tutorial: build an interpretable ML model to predict football play success using public datasets. Includes ethics and 2026 trends.
Hook: Turn students' love of football into a hands-on machine learning lab
Teachers and lifelong learners struggle to find classroom-ready data science projects that are both curriculum-aligned and rooted in real-world datasets. Building an ML model for football playcalling bridges that gap: students gain machine learning experience while studying a topic they already care about — from NFL strategy to stars like Caleb Williams — using public data and classroom-safe code.
Quick summary — What you'll learn and why it matters in 2026
In this tutorial you will: pick a public play-by-play dataset, define a clear success metric, engineer features, train a simple baseline and a stronger model, interpret results with explainable AI tools, and discuss ethics and real-world implications for football analytics and coaching. By 2026, NFL teams and broadcasters increasingly combine real-time analytics and ML assistants in play design and betting simulations — so teaching these skills now prepares students for modern sports data careers and responsible AI use.
Why focus on play success?
Play success is directly actionable for coaches and analysts. Instead of predicting final scores, a play-level model teaches students about feature engineering, class imbalance, and interpretability — all within a compact lab that runs on a classroom laptop.
Materials & prerequisites
- Python 3.9+ with pandas, scikit-learn, xgboost, shap, matplotlib/seaborn installed
- Optional: Google Colab or JupyterLab for an interactive classroom environment
- Public datasets: nflfastR play-by-play (public CSVs) and Kaggle's NFL Big Data Bowl tracking/play labels for advanced labs
- Basic familiarity with Python and pandas; no prior ML experience required
Step 1 — Choose the dataset (public and classroom-friendly)
For an accessible first lab use the nflfastR play-by-play dataset. It's regularly maintained and includes computed fields such as Expected Points Added (EPA), play type, down, distance, and yardline. For advanced classes that want tracking and pre-snap alignment features, use the Kaggle Big Data Bowl dataset (ensure you follow Kaggle's terms and protect player privacy when displaying tracking visuals).
Dataset links (2026 checked)
- nflfastR play-by-play CSVs (season-by-season) — a classroom staple
- Kaggle Big Data Bowl — tracking + labels for advanced feature engineering
- Sports analytics writeups and competition kernels — useful for inspiration
Step 2 — Define the target: what is "success"?
A clear, reproducible target is essential. Two practical choices:
- EPA > 0 (Expected Points Added): captures net value of the play to score probability; available directly in nflfastR and recommended for fairness across play types.
- Binary yard-based success: e.g., run success = yards >= 0.5 * togo or pass success = yards >= 0.5 * togo; simpler but sensitive to down-and-distance definitions.
In this tutorial we'll use EPA > 0 as our binary target: 1 = success (EPA > 0), 0 = failure (EPA ≤ 0).
Step 3 — Feature selection & engineering
Choose features that are available pre-snap. This avoids leakage and mirrors real playcalling decisions.
- Game context: quarter, seconds remaining, score differential
- Down/distance: down, yards_to_go, yardline_100 (distance to opponent end zone)
- Formation & play type: play_type (pass/run), personnel_group, shotgun, play_action
- Field position: redzone flags, inside opponent 20
- Team strength proxies: home/away, team season win rate or Elo (optional)
Advanced labs: use tracking-derived features (pre-snap defensive alignment, distance to nearest defender) from Big Data Bowl to teach spatial feature engineering.
Step 4 — Preprocessing best practices
- Filter to play types you want (e.g., exclude penalties & spikes).
- Encode categoricals with one-hot or target encoding (explain trade-offs).
- Handle imbalanced classes — success rates are often ~40–45%; use stratified splits or class weighting.
- Train/test split by season or by game to avoid leakage: if you split randomly you risk leakage across drives from the same game.
- Baseline metrics: accuracy can be misleading; prioritize AUC, precision/recall, and calibration (Brier score).
Step 5 — Build a simple model (classroom-ready code)
Start with a logistic regression baseline, then train an XGBoost classifier as a stronger model. Below is condensed Python pseudocode suitable for a classroom notebook. Keep runtime low by sampling a single season or a subset of plays.
# Load data (pandas)
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score, accuracy_score, precision_score
# Example: load a csv exported from nflfastR
pbp = pd.read_csv('pbp_2025.csv')
# Define target
pbp['success'] = (pbp['epa'] > 0).astype(int)
# Feature subset
features = ['down','yards_to_go','yardline_100','quarter','score_differential','shotgun','play_type']
X = pd.get_dummies(pbp[features], drop_first=True)
y = pbp['success']
# Split by season/game to avoid leakage
train_idx = pbp['game_id'] < 2025000000 # example rule
X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y, test_size=0.2, random_state=42)
# Baseline model
clf = LogisticRegression(max_iter=500, class_weight='balanced')
clf.fit(X_train, y_train)
print('AUC:', roc_auc_score(y_test, clf.predict_proba(X_test)[:,1]))
Then train an XGBoost model for improved performance (classroom note: set n_estimators small, e.g., 100, for fast training).
Step 6 — Evaluate and interpret
Don't stop at a single metric. A recommended evaluation checklist:
- AUC-ROC for ranking skill.
- Precision at k if you want the model to pick the top-X plays.
- Calibration plot to see whether predicted probabilities match observed success rates.
- Lift charts and confusion matrices for decision thresholds.
Model interpretability: SHAP & partial dependence
Interpretability is essential in classrooms and in real playcalling. Explainable AI methods help students move from "black box" predictions to actionable insights:
- Feature importance (tree-based): quick but can mislead when features are correlated.
- SHAP values: show per-play contribution of each feature to the predicted probability; excellent for classroom demos.
- Partial dependence plots: visualize how a feature (e.g., yards_to_go) affects predicted success, holding others constant.
# SHAP example (after training XGBoost model 'xgb')
import shap
explainer = shap.Explainer(xgb)
shap_values = explainer(X_test)
shap.summary_plot(shap_values, X_test)
Use SHAP plots to ask students: which situations most strongly change the model's recommendation? How would a coach use that insight?
"A model that can't explain why it prefers a play isn't useful for a coach making a live decision." — Classroom best practice
Step 7 — Classroom activities & extensions
Turn the tutorial into interactive lessons and assessments:
- Activity 1: Coach vs Model — split the class into teams: one uses the model to call plays, the other uses human logic; simulate drives and score outcomes.
- Activity 2: Interpretability lab — each student explains three high-SHAP plays to the class, practicing data communication.
- Extension: Add tracking features from Big Data Bowl and compare model lift; discuss privacy constraints.
- Assessment idea: ask students to write a 500-word memo recommending how a high-school coach should use the model (focus on transparency and limits).
Ethics, privacy, and real-world impacts
Teaching ML for playcalling offers a perfect springboard into ethics. Discuss these key concerns with students:
- Player privacy: tracking data reveals micro-movements. Always follow dataset terms and anonymize when presenting replays.
- Gambling and integrity: models change betting markets; media simulation models already move lines — note SportsLine's 10,000-simulation approach in 2026-era media coverage.
- Bias & fairness: model could reinforce conservative playcalling if trained on particular coaching philosophies. Ask: who benefits from these predictions?
- Human oversight: models should augment — not replace — coach judgment. Emphasize explainability before operational deployment.
Recent 2025–2026 trends: teams and broadcasters are embracing near-real-time ML tools. The NFL and analytics communities are debating controlled access to tracking feeds to balance innovation with privacy and competitive fairness; your classroom should reflect these debates so students learn both tech and responsibility.
Connecting to current events: Caleb Williams, the Bears, and analytics
In 2026, high-profile players and teams (including rookie-to-star transitions like Caleb Williams's move to the NFL and the Chicago Bears' offensive innovation) are frequently discussed alongside analytics. Use recent coverage as context: media outlets now combine ML simulations and human analysis to predict playoff outcomes and playcalling tendencies. This makes classroom exercises timely — students can compare their model's suggestions with analyst picks and discuss where ML and human intuition align or diverge.
Advanced strategies & 2026 trends for further study
For higher-level students, present modern directions that have emerged by 2026:
- Multimodal models: fusion of video, tracking, and play-by-play to suggest plays conditioned on real-time opponent alignment.
- Edge deployment: teams experimenting with on-field inference on low-latency devices to assist playcallers during halftime and per-drive adjustments.
- Federated learning: a proposed approach to train across teams without sharing raw tracking data, addressing privacy and competitive concerns.
- Model documentation standards: increasing adoption of model cards and datasheets to ensure transparency in sports ML systems.
Classroom safety checklist & reproducibility
- Always cite data sources and respect dataset licenses.
- Anonymize or avoid identifying specific players when discussing tracking visuals in younger classrooms.
- Provide starter notebooks and seed random_state values to make experiments reproducible.
- Record baseline metrics so students can iterate and report improvement responsibly.
Actionable takeaways — a one-page checklist
- Download one season of nflfastR and inspect the columns.
- Define your target: use EPA > 0 for robustness.
- Train a logistic regression baseline; report AUC & calibration.
- Train an XGBoost model and compare lift vs baseline.
- Use SHAP to interpret four example plays and write a short memo on model limitations.
- Discuss ethical issues: privacy, gambling, and trust in AI-assisted decisions.
Final classroom notes: communication beats raw accuracy
In a teaching setting, the goal is not to build the world's most accurate play predictor — it's to teach scientific thinking, reproducible ML workflows, and ethical considerations. When students can explain what drives a model's prediction and why a coach might accept or reject it, learning has succeeded.
Call to action
Ready to run this lab? Download the starter notebook I prepared for classrooms in 2026 (includes sample code, datasets, and assessment rubrics). Try the lesson with one class period and iterate: let students present model explanations as their summative project. Share your best student notebooks with the naturalscience.uk community so we can build a public library of classroom-ready sports ML lessons.
Related Reading
- Localizing Music: How to Translate and License Lyrics for the Japanese Market
- Late to Podcasting but Notable: What Ant & Dec’s ‘Hanging Out’ Teaches Celebrity Shows
- Case Study: How a Creator Used Local AI + Raspberry Pi to Keep Community Data Private
- Budget Streaming for Expat Families: Comparing Spotify, Netflix, Disney+ and Local Danish Alternatives
- How to Build Link Equity with an ARG: A Step-by-Step Transmedia Link-Building Playbook
Related Topics
Unknown
Contributor
Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.
Up Next
More stories handpicked for you
Engaging Students in Environmental Monitoring with Singing Plants
Understanding Micro-ecosystems: Lessons from the Underground
The Agentic Web: Understanding Brand Interactions in the Digital Age
The Making of a Modern Scientist: Career Pathways in Wetland Research
Weddings and Cultural Moments: An Analysis of Public Perception
From Our Network
Trending stories across our publication group