Help & Documentation
Learn how to use the NFL Simulation Engine. This guide covers everything from basic concepts to advanced workflows.
Overview: What Does This System Do?
The NFL Simulation Engine predicts NFL game outcomes using a Bayesian statistical model. Here's what it does:
- Trains models on historical play-by-play data to learn team strengths, QB effects, and situational factors
- Simulates games thousands of times to generate win probabilities and score distributions
- Backtests model performance on historical games to evaluate accuracy
- Calibrates predictions with betting market data (optional)
Example Use Case
You want to predict the outcome of Chiefs vs Bills. The system:
- Uses trained model to estimate each team's offensive/defensive strength
- Accounts for QB quality (Mahomes vs Allen)
- Simulates the game 10,000 times
- Reports: "Chiefs win 58% of simulations, average score 27-24"
Key Concepts
1. EPA (Expected Points Added)
EPA measures how many points a play is worth on average. A +3 EPA play means the team gained about 3 points of expected value.
The model predicts EPA for each play based on:
- Team offensive/defensive strength
- QB quality
- Down and distance
- Field position
- Game situation (score, time remaining)
2. Model Training
Training teaches the model patterns from historical data. You specify:
- Seasons: Which years of data to use (e.g., 2016-2024)
- Profile: How thorough the training should be
- Dev: Quick test (~5 min, 50k samples)
- Fast: Standard training (~15 min, 120k samples)
- Full: Comprehensive (~2+ hours, all data)
- Inference Method: How the model learns
- ADVI: Faster, approximate (good for dev/fast)
- NUTS: Slower, exact MCMC (best for full training)
3. Simulation
Simulation runs the game thousands of times using the trained model. Each simulation:
- Generates drives using EPA predictions
- Converts drive EPA to points
- Alternates possessions between teams
- Records final score
After 10,000 simulations, you get distributions: "Chiefs win 58% of the time, average score 27-24."
4. Model Artifacts
Each trained model is saved as an "artifact" with:
- Model Hash: Unique identifier (e.g., "a3f2b1c9")
- Seasons: Training data used
- Metrics: Performance statistics
- Active Status: Whether it's used by default
You can have multiple models and switch between them. Only one can be "active" at a time.
Basic Workflow
Here's the typical workflow for using the system:
Step 1: Train a Model
First, you need a trained model. Go to Train Model and:
- Select seasons (e.g., "2016,2017,2018,2019,2020,2021,2022,2023,2024")
- Choose profile (start with "Fast" for ~15 minutes)
- Click "Start Training"
- Wait for completion (check job status)
Note: Training can take 15 minutes to several hours depending on profile.
Step 2: Run a Simulation
Once you have an active model:
- Go to New Simulation
- Enter teams (e.g., Home: "KC", Away: "BUF")
- Enter QBs (e.g., "Patrick Mahomes", "Josh Allen")
- Set number of simulations (default 10,000 is good)
- Click "Submit Simulation"
- View results on the job detail page
Step 3: Interpret Results
The results show:
- Win Probability: % chance home team wins
- Score Distribution: Average scores and ranges
- Spread/Total: Predicted point spread and total points
- Quantiles: 5th, 25th, 50th, 75th, 95th percentiles
Training a Model: Detailed Guide
When to Train
- First time using the system
- After new season data becomes available
- When you want to test different training configurations
- When you want to use different season ranges
Training Parameters Explained
Seasons
Format: Comma-separated list (e.g., "2016,2017,2018,2019,2020,2021,2022,2023,2024")
Recommendation: Use recent seasons (last 5-8 years). Older data may be less relevant due to rule changes.
Training Profile
| Profile |
Time |
Samples |
Chains |
Use Case |
| Dev |
~5 min |
50k |
2 |
Quick testing, development |
| Fast |
~15 min |
120k |
2 |
Standard use, good balance |
| Full |
~2+ hours |
All data |
4 |
Production, best accuracy |
| Overnight |
~4+ hours |
All data |
4 (NUTS) |
Highest quality, run overnight |
Inference Method
- Auto: Uses ADVI for dev/fast, NUTS for full (recommended)
- ADVI: Faster, approximate Bayesian inference
- NUTS: Slower, exact MCMC sampling (best quality)
Example: Your First Training
For Beginners:
- Go to Train Model
- Seasons:
2016,2017,2018,2019,2020,2021,2022,2023,2024
- Profile: Fast (good balance of speed and quality)
- Inference: Auto
- Click "Start Training"
- Wait ~15 minutes (check job status page)
- Model will be set as active automatically
Monitoring Training Progress
After starting training:
- You'll be redirected to the job detail page
- Watch the log for progress updates
- Status will change: Queued → Running → Succeeded
- When complete, model appears in Model Registry
Running Simulations: Detailed Guide
Required Information
- Home Team: 2-3 letter abbreviation (e.g., "KC", "BUF", "SF")
- Away Team: 2-3 letter abbreviation
- Home QB: Full name (e.g., "Patrick Mahomes")
- Away QB: Full name (e.g., "Josh Allen")
Optional Parameters
- Number of Simulations: Default 10,000 is good. More = more accurate but slower.
- Random Seed: For reproducibility (default 42 is fine)
- Model: Use active model (default) or select specific model
- Market CSV: Path to betting market data (advanced)
- Market Blend Weight: 0-1, how much to blend with market (0 = model only)
Example: Chiefs vs Bills
Scenario: You want to simulate Chiefs (home) vs Bills (away)
- Go to New Simulation
- Home Team:
KC
- Away Team:
BUF
- Home QB:
Patrick Mahomes
- Away QB:
Josh Allen
- Simulations:
10000 (default)
- Click "Submit Simulation"
Result: You'll see win probability, score distributions, and more.
Understanding QB Names
QB names should match how they appear in NFL data. Common formats:
- "Patrick Mahomes" (not "P. Mahomes" or "Mahomes")
- "Josh Allen" (not "J. Allen")
- "Lamar Jackson" (full name)
Tip: If unsure, check recent game logs or use the QB's full name as it appears in official NFL stats.
Market Calibration (Advanced)
Market calibration blends your model's predictions with betting market consensus:
- w = 0.0: Use only model predictions (default)
- w = 0.5: Equal blend of model and market
- w = 1.0: Use only market predictions
When to use: If you have betting market data and want to incorporate market wisdom.
Requires a CSV file with columns: game_id, spread_home, total, home_team, away_team.
Running Backtests: Detailed Guide
What is a Backtest?
A backtest evaluates model accuracy by:
- Running simulations on historical games
- Comparing predictions to actual outcomes
- Calculating metrics (Brier score, log loss, calibration)
When to Run Backtests
- After training a new model (validate performance)
- Comparing different models
- Evaluating model improvements
- Understanding model strengths/weaknesses
Backtest Parameters
- Seasons: Which seasons to test on (e.g., "2023,2024")
- Simulations per Game: Default 5,000 is good
- Model: Which model to test (default: active model)
- Market CSV: Optional, for comparing to market
Example: Testing 2023 Season
- Go to Run Backtest
- Seasons:
2023
- Simulations:
5000 (default)
- Model: Use active model
- Click "Submit Backtest"
- Wait for completion (can take 30+ minutes)
- View results: accuracy metrics, calibration curves, etc.
Understanding Backtest Results
The backtest report includes:
- Brier Score: Lower is better (measures prediction accuracy)
- Log Loss: Lower is better (measures probability calibration)
- Calibration Curve: Shows if probabilities match actual frequencies
- Game-by-Game Results: Predictions vs actual for each game
Step-by-Step Examples
Example 1: Complete Beginner Workflow
Goal: Predict Chiefs vs Bills game
Step 1: Train Your First Model
- Click "Train Model" in navigation
- Seasons field: Enter
2016,2017,2018,2019,2020,2021,2022,2023,2024
- Profile dropdown: Select "Fast"
- Inference: Leave as "Auto"
- Click "Start Training"
- Wait ~15 minutes (watch the job status page)
Step 2: Run Your First Simulation
- Once training completes, click "New Simulation"
- Home Team:
KC
- Away Team:
BUF
- Home QB:
Patrick Mahomes
- Away QB:
Josh Allen
- Leave other fields as defaults
- Click "Submit Simulation"
Step 3: Interpret Results
On the results page, you'll see:
- Win Probability: e.g., "58.2%" means Chiefs win 58.2% of simulations
- Score Distribution: Average scores and ranges
- Spread: Predicted point spread (e.g., "Chiefs by 3.2 points")
Example 2: Comparing Two Models
Goal: See if a model trained on recent seasons performs better
Step 1: Train Model A (All Seasons)
- Train with seasons:
2016,2017,2018,2019,2020,2021,2022,2023,2024
- Profile: Fast
- Note the model hash (e.g., "a3f2b1c9")
Step 2: Train Model B (Recent Seasons Only)
- Train with seasons:
2020,2021,2022,2023,2024
- Profile: Fast
- Note the model hash (e.g., "b4e3c2d1")
Step 3: Run Same Simulation with Both Models
- Run simulation: Chiefs vs Bills
- First time: Use Model A (select from dropdown)
- Second time: Use Model B (select from dropdown)
- Compare results: Which gives more realistic predictions?
Step 4: Backtest Both Models
- Run backtest on 2023 season with Model A
- Run backtest on 2023 season with Model B
- Compare Brier scores: Lower is better
Example 3: Using Market Calibration
Goal: Blend model predictions with betting market data
Step 1: Prepare Market Data
Create a CSV file with columns:
game_id,spread_home,total,home_team,away_team
2024_1_KC_BUF,-3.5,52.5,KC,BUF
2024_1_SF_DAL,7.0,48.0,SF,DAL
Step 2: Upload CSV
Place CSV file on server (e.g., /home/azureuser/nfl-sim/market_data.csv)
Step 3: Run Simulation with Market Blend
- Go to New Simulation
- Enter teams and QBs
- Market CSV Path:
/home/azureuser/nfl-sim/market_data.csv
- Market Blend Weight:
0.3 (30% market, 70% model)
- Submit simulation
Result: Predictions blend your model (70%) with market consensus (30%)
Interpreting Results
Simulation Results
Win Probability
Example: "58.2%" means the home team wins in 58.2% of simulations.
- 50% = toss-up
- >60% = strong favorite
- <40% = strong underdog
Score Distribution
Example: "Home: 27.3 (std: 7.2), Away: 24.1 (std: 6.8)"
- Mean: Average score across all simulations
- Std: Standard deviation (higher = more uncertainty)
Quantiles
Example: "Home Score 5th percentile: 15, 95th percentile: 38"
- 5th percentile: Score exceeded in 95% of simulations (low end)
- 50th percentile (median): Middle score
- 95th percentile: Score exceeded in only 5% of simulations (high end)
Interpretation: "There's a 90% chance the home team scores between 15 and 38 points."
Spread
Example: "Spread: 3.2 (std: 10.1)"
- Mean: Average point difference (home - away)
- Positive = home team favored
- Negative = away team favored
Backtest Results
Brier Score
Measures prediction accuracy. Range: 0 (perfect) to 1 (worst).
- <0.20 = Excellent
- 0.20-0.25 = Good
- >0.25 = Needs improvement
Log Loss
Measures probability calibration. Lower is better.
- <0.50 = Excellent
- 0.50-0.70 = Good
- >0.70 = Needs improvement
Calibration Curve
Shows if predicted probabilities match actual frequencies.
- Diagonal line: Perfect calibration
- Above diagonal: Model is overconfident
- Below diagonal: Model is underconfident
Troubleshooting
Common Issues
Training Job Stuck on "Queued"
Problem: Job never starts running
Solutions:
- Check Celery workers are running:
sudo systemctl status gamesim-celery-train
- Restart workers if needed:
sudo systemctl restart gamesim-celery-train
- Check logs:
sudo journalctl -u gamesim-celery-train -f
Simulation Returns Unexpected Results
Problem: Win probabilities seem wrong
Solutions:
- Verify QB names match NFL data exactly
- Check team abbreviations are correct (3-letter codes)
- Ensure model is trained on recent seasons
- Try increasing number of simulations (10,000+ recommended)
Model Training Fails
Problem: Training job fails with error
Solutions:
- Check job logs for specific error message
- Verify data is available for selected seasons
- Try smaller season range first
- Check disk space:
df -h
- Check memory:
free -h
Can't Find QB in Model
Problem: QB name not recognized
Solutions:
- Use full name as it appears in NFL stats (e.g., "Patrick Mahomes" not "P. Mahomes")
- Check if QB played in training seasons
- Try "Unknown" if QB is not in training data (model will use average QB effect)
Getting Help
- Check job logs for detailed error messages
- Review model metrics in Model Registry
- Compare with known-good examples
- Check system resources (CPU, memory, disk)
Quick Reference
Training Profiles
- Dev: ~5 min, quick test
- Fast: ~15 min, standard
- Full: ~2+ hours, comprehensive
- Overnight: ~4+ hours, highest quality
Recommended Settings
- Seasons: Last 5-8 years
- Simulations: 10,000
- Profile: Fast (for most users)
- Inference: Auto
Team Abbreviations
- Use 2-3 letter codes
- Examples: KC, BUF, SF, DAL
- Case doesn't matter