What can best predict the winner of a game of footy?

This article is the first in a series where I will try and build out a model to predict games of footy.

Predicting the winner of a game of footy is the only acceptable form of tipping in Australia. It's not just about predicting the winner though, but also who'll kick the first goal, touch the footy the most, scores at half-time as well as combine any and all of these predictions into a "multi" by tipping which 3 horses will finish the fastest in some race happening the other side of the county. This isn't to be anti gambling – Australia, just like nearly every other country in the world, gambles on sport. It's what funds the advertising, and therefore the wages, of players, coaches – the whole industry1.

It is more to say that there is already an industry in place that predicts the outcome of a game – people can put their money where their mouth is and guess the winner. If you are right, then they will pay you (though if you get it right too often, they will probably ban you).

This does mean there is already an existing benchmark we can compare ourselves to when we try to predict the winner. Can we create a model to predict the winner more accurately than the odds put forward by gambling companies? Can we look into the universe, find some pattern and use it to make a prediction? Apparently it's a pretty good feeling2, so let's give it a go.

In the wise words of Paul Kelly, from little things, big things grow3. To begin with, let's build out a model using some variables that seem like they are relevant. The model I will build out at first will be relatively basic, but will at least give us an opportunity to jump into the dataset. To begin with, my aim is that the model provides a better prediction than just tipping on the home team. I'll leave beating the gambling companies to a more complex version.

I conducted logistic regression analysis on the following variables4. This basically looks at how much the chances of victory change when a variable changes. I.e. if you scored more points in the previous 5 games, to what extent does this increase the chances of victory? If you want to understand more detail on the exact work, then you can look at this page here. Otherwise, for a high level breakdown, let's take a look at the data4:

Variable Description Logistic Regression Coefficient5 So, does it matter?
Home ground advantage The assumption here is that playing at home brings an advantage.

It was measured by whoever was listed as the home team per afl.com, regardless of whether it was a neutral venue or the game had been sold.
0.312109 Yes, slight advantage for home teams
Team stability (over 2 rounds) The assumption is that a team which has fewer changes is more likely to be a "winning" team.

This was measured by how many players were different from the previous 2 rounds. If a player played 1 game, then skipped a game and played again they would not be treated as a different player.6
-0.083847 Team instability is a relatively weak negative indicator
Player ages There is a general understanding that teams in a "win now" phase are more likely to play players between the ages of 25-30, and that teams who are rebuilding will play the kids.

This was measured by the players' age on game day (game date less date of birth).
0.270478 Yes, teams with an older average age are slightly more likely to win
Recent form (scored last 5) The more a team has scored over their last 5 games, the more likely they are to win.

This was measured as the average points scored over the previous 5 games.
0.272462 Yes, higher scoring teams in recent games are slightly more likely to win
Recent form (conceded last 5) The more a team has conceded over their last 5 games, the less likely they are to win.

This was measured as the average points conceded over the previous 5 games.
-0.229053 Yes, conceding more points in recent games reduces the chance of winning
Days since previous game If a team is tired because they didn't get enough rest, then they should be less likely to win.

Measured in days between games.
-0.030959 No, there is almost no relationship between rest days and winning

A model based on all of above accurately predicts a winner 61.7% of the time4. Overall, none of these indicators are strongly predictive on their own, or even combined. I'm not at the stage of being willing to bet money on it, but at least this model beats simply tipping on the home team (56.6% win rate for the same period).

It doesn't beat tipping the home team by all that much though, so next I will:

Footnotes

  1. Though footy clubs go one step further by also relying on pokies (gambling machines) mainly located in low income areas to subsidise their football operations.
  2. https://www.youtube.com/watch?v=8ITghJhlnPA
  3. https://www.youtube.com/watch?v=6_ndC07C2qw
  4. Please go here for the detailed data
  5. The smaller the number, the lower the impact. The closer to -1 or 1, the higher the impact with -1 associated with defeat and 1 with victory.