Anything Else?

I’ve now covered what will form the basis for my predictive model: ELO, home ground, team age, and building a model from players up.

What's left to look at? Well, there might be something that I am missing so I will test out a couple more theories.

Head to head history: do some teams have an edge over others?

I put this to the test by looking at head to head history going back 3, 4, 5, 6, and 7 games. For each of these I checked whether there was any relationship between the amount of wins and the result of the next game. Over the entire range the accuracy was consistently around 58–59%, i.e the team with the better history won 58-59% of the time. This is well above random chance (and better than just picking the home team).

One thing I am cognizant of is not overlapping the same thing in different parts of the model. While head to head history does seem to have some relevance to the outcome, it is not as meaningful as either ELO or the player impact model I built. It is most likely that a team with a positive head to head history will win because it is the better team (and therefore captured by other parts of the model). The R squared value of this variable (~2% across all number of head to head games that I looked at) tells me that adding in head to head history gives me little information above and beyond what I already have in the model.

In short: head to head history itself won't be added to the model because it's covered by other parts already.

The weather: does rain make a difference?

Next up I want to take a look at whether rain makes a difference, specifically looking at whether shorter teams play better if it rains. So, at a basic level, rain does mean that games are slightly lower scoring, with the likelihood of blowouts being lower:

Absolute Margin by Weather
Rain? Median Mean
No 29.0 34.21
Yes 27.0 32.36

When we add height to the equation, there is a very slight impact for the win rates when it rains:

Rain? Win Rate (%)
No 47.79%
Yes 46.47%

While it was surprising that the taller team was more likely to lose, the impact of rain was not really enough to make me want to add it to the model. Running regression analysis over both the win/loss outcome and the margin told me that rain and height were not material. Rain ended up being a little bit like Linkin Park: in the end, it didn't really matter1.

Next Steps

So, now I think I am ready to try and bring the model together to (hopefully) predict game winners!

For more information on the different analyses run, please visit the GitHub repository.

Footnotes