The Model
Well, now it's time to try and build out a predictive model bringing together all the different aspects I've looked at:
Before getting into the detail of the model and how it went, I want to briefly cover off what I am not including:
- Weather: Analysis showed it didn't have a material impact on the outcome of a game (it had a minor impact in reducing overall scoring which might be linked to a larger likelihood of upsets, but not enough to be material).
- Head to head history between teams: Analysis showed this was not material, and with team ELO and momentum, I'm already covering enough macro-level variables.
- Geospatial analysis within player impact scores: The biggest weakness of my player impact score model was that it did not include geospatial analysis (i.e., where a player is on the field and where they move the ball after disposal). I do not have access to this information, and will need to accept it as a limitation.
- Data pre-2010: I needed to have a cut off at a certain point, and this was it.
When I first started building out this model, my aim was to be able to predict the winner of a game of footy ~80% of the time. Now, footy is an unpredictable sport (see here), but I had faith I could decipher the indecipherable. So, how did I go?
The model made two types of predictions:
- The winner of a game
- The margin
The model was able to predict a winner 70.3% of the time. This put it above any of the individual pieces of the model, as well as just picking the home team, but well short of my 80% ambition:
Model | Accuracy (%) | AUC |
---|---|---|
Home team only | 56.6 | N/A |
Older team only | 63.0 | N/A |
Top-down (Model v1) | 61.7 | 0.672 |
Player-based sum (Model v2a) | 63.7 | 0.708 |
Player-based sum (Model v2b) | 65.8 | 0.714 |
ELO (basic, margin-adjusted) | 67.4 | 0.737 |
The Model | 70.2 | 0.773 |
AUC (Area Under the Curve) measures how well the model can rank or separate winners from losers. An AUC of 0.5 means random guessing, while 1.0 is perfect discrimination. My best model’s AUC of 0.773 means that, for a randomly chosen game, the model is about 77% likely to assign a higher probability to the actual winner than to the loser.
I also took a closer look at how well the model could predict games, and it was reasonably accurate. Taking the 2024 season as an example:
How sure? | Correct (%) | Wrong (%) | Grand Total |
---|---|---|---|
50-60% sure | 57.14 | 42.86 | 100.00 |
60-70% sure | 61.11 | 38.89 | 100.00 |
70-90% sure | 80.00 | 20.00 | 100.00 |
90%+ sure | 100.00 | 100.00 | |
Grand Total | 69.44 | 30.56 | 100.00 |
How sure? | Correct | Wrong | Grand Total |
---|---|---|---|
50-60% sure | 40 | 30 | 70 |
60-70% sure | 33 | 21 | 54 |
70-90% sure | 60 | 15 | 75 |
90%+ sure | 17 | 17 | |
Grand Total | 150 | 66 | 216 |
- When the model was quite sure of the winner, they won.
- When the model wasn't too sure, accuracy dropped.
When I ran the model predicting the margin of a game, there was a reasonable correlation between predicted and actual margin:
My MAE (mean absolute error) was ~28 points off; this was a ~20% improvement on what the baseline MAE would be if you predicted 0 margin for each game (i.e a draw).
So, what does this all mean? Well, in a tipping competition I would be able to tip a little under 6.5 correct every round, and if I wanted to gamble I would probably end up making the bookmakers a lot of money.
Are there ways to improve the model? Honestly, I think there are a couple ways:
- More in depth analysis of home ground advantage, especially specific grounds.
- Better analysis of player impact, combining age, geospatial data, and player synergy—this is probably a whole project in and of itself.
Overall though, the model can consistently pick a winner, even if it does slightly underperform the level I thought it could achieve. Next steps are to build out a way to track performance over time.
The full code for this project can be found on GitHub.