Position-by-Position: Using College Football Matchups to Teach Data Visualization and Comparative Statistics
Data ScienceLesson PlansStatistics

Position-by-Position: Using College Football Matchups to Teach Data Visualization and Comparative Statistics

UUnknown
2026-02-25
9 min read
Advertisement

Use the Indiana vs. Miami CFP final to teach data visualization, normalization, and statistical significance with classroom-ready datasets and lesson plans.

Hook: Turn a high-profile college football matchup into a standards-aligned data unit

Teachers, coaches, and lifelong learners struggle to find classroom-ready datasets and activities that connect real-world excitement to core statistical concepts. The 2026 College Football Playoff National Championship between Indiana and Miami is a timely, motivating anchor for a unit on data visualization and comparative statistics. This article gives you a ready-to-use, curriculum-aligned lesson plan, step-by-step activities, reproducible visualizations and assessments that teach statistical significance using position-by-position analysis.

Top-line: What students will learn and why it matters now (2026 context)

In 2026, sports analytics tools and accessible play-by-play/tracking datasets have become more available to classrooms. Use the Indiana vs. Miami matchup to teach:

  • Data collection & cleaning from play-by-play and position-group summaries
  • Exploratory Data Analysis (EDA) using visualizations: bar charts, radar plots, heatmaps and distributions
  • Comparative statistics: rate normalization, significance testing, effect size and confidence intervals
  • Critical interpretation: correlation vs. causation and pitfalls in sports analytics

Actionable takeaway: within one 60–90 minute lesson students can produce a clear comparative visualization of two teams by position and perform a basic hypothesis test with interpretation.

Why Indiana vs. Miami is an ideal classroom case study

The matchup is both topical and rich in comparative angles. Position-by-position breakdowns (e.g., offensive line vs. defensive line, quarterback vs. secondary) let students compare like-with-like and practice normalization (per snap, per play, per drive). Using a real, current game increases engagement while giving teachers entry points to standards-aligned outcomes (data literacy, statistics, computational thinking).

“Use a single, compelling event to teach a bundle of transferable data skills — from cleaning to inference.”

Datasets & sources you can use in class (free and subscription)

Choose datasets based on grade level and classroom resources. Below are practical sources teachers have used since late 2024–2026 as more college-level play-by-play and tracking feeds became public.

  • College Football Data API (play-by-play, team-level): good for basic per-play statistics. (collegefootballdata.com)
  • Sports-Reference / cfbstats: box scores and season summaries for per-position counting stats. (sports-reference.com)
  • Third-party analytics providers (PFF, Next Gen Stats-style products): use small extracts for advanced classroom work where licensing allows
  • Synthetic dataset: pre-cleaned CSVs that mimic Indiana vs. Miami position-by-position metrics for classrooms without internet access (we provide classroom-ready CSV templates in the lesson pack)

Lesson plan overview — 3-lesson sequence (adaptable for 9–12 and introductory college)

Lesson 1: Data collection, cleaning and normalization (60 minutes)

  • Learning goals: read play-by-play CSVs, merge roster data, compute per-snap and per-drive rates.
  • Materials: laptop, spreadsheet or Jupyter/RStudio, sample CSV (team_play_by_play.csv)
  • Class activity (30 min): identify position groups (QB, RB, WR, OL, DL, LB, DB, ST), calculate per-100-play rates for yards, pressures, tackles.
  • Assessment (30 min): short worksheet asking students to justify normalization choice (per snap vs. per drive) and to produce a cleaned table of per-position rates for Indiana and Miami.

Lesson 2: Visualization and exploratory analysis (60–90 minutes)

  • Learning goals: pick appropriate visual forms to compare positions; interpret distributions vs. summary statistics.
  • Materials: spreadsheet charts or Python (pandas + matplotlib/seaborn) / R (tidyverse + ggplot2)
  • Class activity: produce three visualizations comparing Indiana vs. Miami by position:
    • Bar chart of mean yards per play by position group
    • Radar (spider) chart showing multi-metric profile for each position group
    • Boxplots or violin plots of per-play distributions for a selected metric (e.g., QB completion yardage or rush distances)
  • Discussion prompt: Which visualization best communicates differences? Why?

Lesson 3: Comparative statistics and significance (90 minutes)

  • Learning goals: conduct hypothesis tests, compute effect sizes and confidence intervals, handle multiple comparisons.
  • Materials: pre-cleaned per-play datasets and calculator or coding environment
  • Class activity (60 min):
    1. Formulate hypotheses: e.g., "Indiana's offensive line allows fewer pressures per 100 pass plays than Miami's defensive front."
    2. Choose test: t-test for roughly normal, Mann–Whitney U if skewed, or permutation test for classroom-friendly randomization inference
    3. Compute effect size (Cohen's d) and 95% confidence intervals (bootstrap if appropriate)
    4. Apply a multiple-comparison correction (Bonferroni or Benjamini–Hochberg) when testing across many positions
  • Assessment (30 min): students submit a short report that answers: Are differences statistically significant? Are they practically meaningful? What caveats exist?

Detailed classroom exercise — Position-by-position comparison

This is a reproducible classroom-ready task that you can complete in one double period or across two lessons.

Step 0: Learning outcomes

  • Students will create a multi-panel visualization comparing Indiana and Miami by position.
  • Students will perform at least one statistical test to assess whether observed differences are unlikely under chance.
  • Students will write a 300-word evidence-based interpretation that includes limitations and real-world implications.

Step 1: Decide on metrics and normalization

Pick 3–5 comparable metrics for each position group. Examples:

  • Offense: yards per play, success rate (>=40% of needed yards), pressure rate allowed per 100 pass plays
  • Defense: tackles for loss per game, QB pressure rate, pass breakups per target
  • Special teams: kick return average, punting net yards

Normalize by plays or snaps to avoid misleading per-game numbers.

Step 2: Visualize

Suggested visuals and why to use them:

  • Bar chart (mean ± SE): quick ordinal comparisons by position
  • Radar chart: multi-metric profile for a single position group (useful for show-and-tell)
  • Boxplot / violin plot: reveals distribution, skew and outliers
  • Heatmap: position vs. metric matrix with color showing performance gap (Indiana minus Miami)

Classroom tip: start with spreadsheets for accessibility, then show how the same charts are created in Python/R to expose computational workflows.

Step 3: Statistical testing — practical walk-through

Use this simplified protocol appropriate for secondary or introductory college students.

  1. State two-sided null hypotheses for each metric: "No difference between Indiana and Miami for X (per-play rate)".
  2. Check assumptions quickly: if sample sizes are small or skewed, prefer Mann–Whitney U or permutation tests; if roughly normal, use t-test.
  3. Compute effect size: Cohen's d or rank-biserial correlation for non-parametric tests.
  4. Report 95% confidence intervals — bootstrap resampling is a great classroom method because it's intuitive and robust.
  5. Correct for multiple tests: if testing across 10 position/metric combinations, implement a Benjamini–Hochberg false-discovery rate (FDR) control rather than conservative Bonferroni for better classroom discussions about Type I/II errors.

Interpretation checklist

  • Is the p-value small enough to reject the null given your correction method?
  • Is the effect size practically meaningful (e.g., 0.2 small, 0.5 medium, 0.8 large for Cohen's d)?
  • Are there confounders — game script, opponent quality, injuries — that could explain differences?
  • Do visualizations tell the same story as the tests?

Classroom-ready rubric & assessment

Use this simple 20-point rubric for the final report and presentation.

  • Data integrity & cleaning: 4 points (clear steps, justified normalization)
  • Visualizations: 6 points (clarity, appropriate choice, labeling)
  • Statistical testing: 6 points (correct test, effect size, CI, multiple comparisons)
  • Interpretation & limitations: 4 points (real-world reasoning, evidence-based)

Extensions and differentiation

Make tasks easier or harder depending on learner level:

  • Intro level: focus on one position pair (e.g., Indiana OL vs. Miami DL) and a bar chart with brief written interpretation.
  • Intermediate: add distribution plots and a permutation test implemented in spreadsheet (randomize labels) or Python.
  • Advanced: fit a logistic regression predicting play success using interaction terms (team*position) and discuss model assumptions.

Practical coding snippets (classroom-friendly) — pseudo-code

Below are high-level steps students can follow in Python or R. Keep these as quick reference; include full scripts in your teacher pack.

  • Load and merge team play-by-play data and roster positions
  • Group by team and position: compute rates per 100 plays
  • Create visualizations (bar/box/heatmap)
  • Run test (t-test, Mann–Whitney, or permutation) and bootstrap CIs

Common pitfalls & how to teach them

  • Comparing raw counts — fix with per-play or per-snap normalization.
  • Misleading visual scales — always use consistent axes when comparing teams.
  • Over-interpreting p-values — teach effect size and confidence intervals as primary evidence.
  • Ignoring context — teach students to integrate qualitative game context (weather, injuries, play-calling) into their conclusions.

Recent years have seen three developments that make this lesson especially powerful in 2026:

  1. Greater availability of public play-by-play datasets and curated college football APIs — easier classroom access to real data.
  2. Improved educational tools and cloud notebooks that let students explore datasets without local installs (e.g., free cloud-based Jupyter environments in many districts).
  3. Growing emphasis in STEM curricula on data literacy and computational thinking — sports analytics is an engaging context that maps directly to standards.

These trends mean teachers can realistically take students from data scraping to inference in a short unit, supporting both computational and statistical standards.

Classroom logistics: time, assessment and accessibility

  • Time: 3 lessons (60–90 minutes each) or an extended unit across a week.
  • Assessment: mix of practical (visualization + code/notebook) and written interpretation (300–500 words).
  • Accessibility: provide CSVs and spreadsheet templates; include step-by-step guides for students without coding experience.
  • Ethics & licensing: review dataset licenses; use only public or properly licensed extracts in class.

Sample student prompt (ready to hand out)

“Using the provided Indiana and Miami position-by-position dataset, produce a one-page visual comparison and a short report (300–500 words). Your report must include: (a) one visualization comparing a chosen metric for at least three position groups, (b) the result of one statistical test with effect size and confidence interval, and (c) two limitations of your analysis.”

Real-world examples & educator experience

Teachers who have used recent championship matchups report higher student engagement and deeper understanding of statistical nuance. One district replaced a generic "mean and median" lesson with a sports-analytics module and observed greater retention of hypothesis-testing concepts. Use that anecdotal endorsement to justify administrative buy-in and parent outreach when scheduling the unit near the championship.

Closing: Key takeaways and next steps

  • Engagement + rigor: Indiana vs. Miami provides a compelling context for teaching rigorous data literacy.
  • Practicality: The unit is adaptable — spreadsheets for beginners, Python/R for advanced students.
  • Interpretation first: Always pair statistical outputs with real-world caveats and effect-size thinking.

Call-to-action

Download the free classroom pack (cleaned CSVs, step-by-step notebooks, printable worksheets and a teacher rubric) to run this unit next week. Try the 60-minute starter activity with your class and share student visualizations with our educator community for feedback and curriculum-aligned badges.

Sources & further reading: CBS Sports position-by-position preview for Indiana vs. Miami; public play-by-play APIs (College Football Data), Sports-Reference box scores, and pedagogy resources on bootstrap and permutation tests. For links and full lesson materials, download the teacher pack.

Advertisement

Related Topics

#Data Science#Lesson Plans#Statistics
U

Unknown

Contributor

Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.

Advertisement
2026-02-25T02:05:33.494Z