Predicting Katowice 2017
Note: I wrote this post before the matches were played. Overall, I was pretty happy since I went three for four. I only had Team 8 at a 54% chance. Most surprising to me was that Nomia actually picked up a game on Misfits, even if they didn't win the series.
On March 3rd, the first international event of the 2017 Heroes Global Championship (HGC) will kick off. Teams from North America, Europe, Latin America, and Australia/New Zealand will come together in Katowice, Poland for The Western Clash. There will be three teams from Europe, three teams from North America, one team from Latin America, and one team from Australia/New Zealand. The eight teams are, by tournament seed:
- Misfits (EU)
- Tempo Storm (NA)
- Fnatic (EU)
- Team 8 (NA)
- Team Dignitas (EU)
- Gale Force eSports (NA)
- Infamous (LATAM)
- Nomia (ANZ)
If you are just curious about my predictions, feel free to skip ahead.
Prediction Methods
I use the Glicko rating system to make my predictions. It is very similar to the ELO rating system, but it takes into account how certain we are about a given player/team's rating. So maybe a team has won every game they've ever played, but has only played 6 games. This would be reflected by the standard deviation of the rating, and the algorithm takes this into account when predicting matches.
In order to get enough data on all the teams, I made the choice to count every game as a match, instead of counting the entire series. Having a best of three or best of five every week really helps fill out the rankings. I feel it is appropriate, since truly dominant teams should rarely lose games, much less series. Using the data from my SQLite database that I collect using this project that I put together, the R code is relatively simple. The PlayerRatings
in R package makes it easy to create ratings and make predictions.
In the Western Clash, the quarterfinals and losers' bracket matches will be best of three, and all other matches are best of 5. Loading the data and required packages:
require("PlayerRatings")
df.match.basic <- read.csv('./match_basic.csv')
Now we need to create the Glicko ratings and make some predictions. The first match will be Misfits against Nomia. Nomia is not currently in our database since the database only has data from North America. As such, they are a bit of a wild card, and start out with the default Glicko Rating and a high standard deviation.
# Need only the relevant columns
glicko.df <- data.frame(df.match.basic$week,
as.character(df.match.basic$home_team),
as.character(df.match.basic$away_team),
df.match.basic$score,
stringsAsFactors = FALSE)
# Need to change the names for glicko
names(glicko.df) <- c('Week', 'HomeTeam', 'AwayTeam', 'Score')
glicko.rat <- glicko(glicko.df) # the ratings
newdata.df <- data.frame(59, 'Misfits', 'Nomia')
pvals <- predict(glicko.rat, newdata.df, tng=1, trat=c(2200,300))
[1] 0.8431877
The value that pops out is the chance of the home team winning, in this case Misfits. Intuition would say that Misfits probably has a greater than 84% chance of winning, but the algorithm is not aware that ANZ as a region is much weaker than EU. Fixing this would require determining the average rating for each region, and then using that as the starting value. For the quarterfinals, we want to calculate the chance of each team winning. A team can win a best of three in three different ways:
- Winning the first two games.
- Winning the first and last game while losing the second one.
- Losing the first match, but winning the second two.
If \(p_1\) is the chance of team 1 winning (Misfits in the example above) a game, and if \(p_w\) is the chance of team 1 winning the match, then
$$ p_w=p_1^2 + p_1^2(1-p_1) + (1-p_1)p_1^2 $$
Calculating this probability for Misfits,
pvals^2+pvals^2*(1-pvals)*2
[1] 0.9339417
Now, a 94% chance of winning is very high, and in my mind is a reasonable representation of the skill difference between the best team in the West and an unknown one.
Now for the other three matches,
# matches to predict
game2.df <- data.frame(59, 'Team 8', 'Team Dignitas')
game3.df <- data.frame(59, 'Tempo Storm', 'Infamous')
game4.df <- data.frame(59, 'Fnatic', 'Gale Force eSports')
# predictions
pval2 <- predict(glicko.rat, game2.df, tng=1)
pval3 <- predict(glicko.rat, game3.df, tng=1, trat=c(2200,300))
pval4 <- predict(glicko.rat, game4.df, tng=1)
# chances of winning
c(pval2^2+pval2^2*(1-pval2)*2,
pval3^2+pval3^2*(1-pval3)*2,
pval4^2+pval4^2*(1-pval4)*2)
[1] 0.5426808 0.7958494 0.7875154
Predictions
- Misfits will win against Nomia (94% chance of winning)
- Team 8 might beat Team Dignitas (54% chance of winning)
- Tempo Storm will beat Infamous (80% chance of winning)
- Fnatic will beat Gale Force eSports (79% chance of winning)