Another weekend, another set of data points for the stat hounds. There are still some holes here, but last year we were able to get virtually everything for Kotei season, so I’m hopeful that will happen again this year (the pre-Kotei EE-legal tournaments that we don’t have the data for are probably just gone).
You may also see that I’ve squeezed an extra column of data in. I figured it might be more useful than keeping more decimal points in the existing column. The “relative chance of making cut” is intended to give a quick look at where a Clan is relative to the field, regardless of the ups and down of what cut system is used at what times. It can also, therefore, be used to compare the balance of one environment to another. The “average” figure from the bottom of the “relative chance” column would then be a measure of environment balance (note that it is the average of the magnitudes, so -20% and 20% are the same for these purposes) – the higher the number, the greater deviation there is between the Clans. I’m not sure if it’s a very good measure, mind you (it might be better to do things like square the percentages, sum then, then take the square root; or to measure the size of a standard deviation, which would be 6.81% right now), but it’s something at least. Note that it’s especially variable now, at the start of an environment, and its hard to assign any normative value to the numbers without comparison points for other environments – but this way I’ll start getting the numbers up now, and then we can go back and compare later. Also note that an additional possible reduction in value of the figure is that it goes deeper than a typical “feel” of balance might be, and so it might not line up with your (or my) impressions of whether an environment is balanced. In particular, a single really busted deck probably doesn’t move the “average” figure as much ones impressions might feel like it should. I’m open to suggestions of a better way to assign a single number to the concept of “environment balance.”
Emperor Edition Environment | ||||||
Players | % of Field | Made Cut | Won | % Made Cut | Relative Chance of Making Cut | |
Crab | 120 | 15.7% | 36 | 4 | 30.0% | 42.5% |
Crane | 63 | 8.3% | 7 | 0 | 11.1% | -47.2% |
Dragon | 76 | 10.0% | 14 | 2 | 18.4% | -12.5% |
Lion | 90 | 11.8% | 27 | 2 | 30.0% | 42.5% |
Mantis | 78 | 10.2% | 17 | 4 | 21.8% | 3.5% |
Phoenix | 80 | 10.5% | 22 | 1 | 27.5% | 30.6% |
Scorpion | 89 | 11.7% | 17 | 2 | 19.1% | -9.3% |
Spider | 74 | 9.7% | 8 | 1 | 10.8% | -48.6% |
Unicorn | 82 | 10.7% | 17 | 1 | 20.7% | -1.5% |
Unaligned | 11 | 1.4% | 1 | 9.1% | ||
763 | 166 | 21.1% | 26.5% |
“Make the Cut %” Ranks
T-1) Crab – 30%
T-1) Lion – 30%
3) Phoenix – 28%
4) Mantis – 22%
AVERAGE 21%
5) Unicorn – 21%
6) Scorpion – 19%
7) Dragon – 18%
8 ) Crane – 11%
9) Spider – 11%
There was also one Oni deck that made the cut last weekend, which would give them a 9% rate. Note that the average figures given above are just averages of the nine Clans, and do not include that 9% Unaligned rate.
Strange Assembly: Your Math HQ.
Dig the post. I think it’s interesting in that both Crab and Lion’s case, their numbers are partially bolstered by having 2 (or more) competitive decks. Lion’s done well with Paragons, Tactigons, Paragon honor and Ancestor honor. Crab’s done well with KHxp and to a lesser extent Berserkers. The only other clan that really stands out as having two competitive decks is Mantis, with ThunderWomens doing well and Kalani’s money.
And Phoenix where spell military and ToP honor are both regularly placing highly.
If I’m reading the numbers right, relative chance of making cut = (chance of making cut) / (average chance of making cut) -1. I’m not sure this is a helpful metric as I think it exaggerates the difference. I think you also have a problem trying to assess balance based on results because it’s so skewed by individual players (1/7 of the Dragon made cut score is Alex, and both wins) and by the effect of bandwagoning (many of the top players in Chicago were playing Kitsu honor, and top players are, I believe, more likely to switch factions than people who play the game less).
Or to put it another way, your numbers are going to reflect what the top players believe to be the strongest and weakest factions based on their playtesting in their groups. That doesn’t necessarily mean it’s wrong, but it tells you less about environment strength, and more about the strength of decks that have actually been discovered and shared widely.
Good point. Good players will find the most edge decks and play them hard which probably amplifies the actual strength of the cards a bit. It wouldn’t make for winners and loosers to change, but it would exaggerate the gaps.
Yeah, relative chance of making the cut is most helpful when comparing things to the average, just see if you’re positive or negative. You obviously understand the math, but I can see how people who don’t understand it might just see huge numbers like 42.5% and -47.2% and just get their mind blown.
And while the numbers are affected heavily by things like bandwagoning and individual players making a number of strong showings, I don’t know of any kind of numeric metric which will do any better. We’ve talked about that in a few episodes in the past, things like the importance of perception of a clan as ‘strong’ or ‘weak’, or the availability of a strong deck. For example, last arc, Scorpion got vastly better when the Dishonor ‘Burn’ deck came to light and people could steal the decklist.
Well, maybe you should get your mind blown when you see numbers like plus or minus 40%. Remember that if Clan X is at +50% and Clan Y is at -50%, that means that that Clan X players have made the cut three times as frequently as Clan Y players. That’s a pretty big real life difference. For reference, Forgotten Temple was at something like +85% before that deck got nerfed, and after that Crane was above +116%.
James, your comment about good players shifting is applicable to the standard numbers anyway. There is definitely a reasonable argument to be made that obviously good and obviously bad Clans have their numbers pulled a bit further away from the mean by players shifting to or from those Clans. As for skewed by individual players, that’s true of all Clans, so there’s no particular reason to think that it doesn’t mostly wash out in the end, barring exceptional circumstances. One guy making the cut twice with one Clan isn’t exactly a big deal (you’re in there twice as well for Phoenix, for example). As for wins, I consider them relatively unimportant from a statistical perspective anyway – I’m including them this arc because people wanted to see them.
As for the value of the “relative chance,” when I talk about not knowing if it is the best measure, the biggest thing I’m thinking of is that because it takes everything evenly, it doesn’t place as much emphasis on the existence of a deck that is way too good or really terrible. For example, if you have one broken deck that’s way above average, but everyone else is kind of clumped together, we’d probably think of that environment as unbalanced, but all the “normal” decks will tend to keep the average down.
It’s quite (OK very) geeky, but there are already statistical measures out there that can determine whether or not the difference in rate of making the cut is actually significant or not. I believe given the size and nature of this data set that a chi-squared test is most appropriate.
Using the raw made cut data (i.e. not controlling the data for #players) yields a chi-square value of 36. This is very highly significant for 8 degrees of freedom and indicates a highly unbalanced environment.
Adjusting the made cut data by player count (and still excluding unaligned – so Crab’s total moves from 36 to 25.07) and repeating the test yields a chi-square score of 16.14, narrowly over the 15.51 threshold for 5% probability with 8 degrees of freedom. This still indicates an unbalanced environment – it says that if the factions were perfectly balanced, there is a less than 1 in 20 chance of observing the distribution seen. Interestingly, Spider and Crane are the largest contributors – i.e. the problem is not that some factions are too strong, it’s that others are too weak. If two of the Crab players who made the cut had instead been playing Crane or Spider, the significance of the results vanishes at a stroke.
So what does all this mean? Assuming all my maths are correct, the conclusions I draw are
– the data shows an unbalanced environment
– but if the best players really are bandwagoning, then the actual cards may be balanced after all (technically, differing playskills among players of different factions breaches the assumption in the chi squared test that the data is drawn by simple random sample – but in the absence of any means of controlling for playskill, we have to add the assumption that players of all factions are equally skilled).
For the interested, the chi values I get for each faction are
clan raw adj for #players
Crab 17.02 2.47
Crane 7.01 4.47
Dragon 1.02 0.47
Lion 4.10 2.47
Mantis 0.10 0.00
Phoenix 0.73 1.18
Scorpion 0.10 0.31
Spider 5.82 4.72
Unicorn 0.10 0.06
Yes, as you astutely point out, chi-square analysis shows that the tournament results are too far from balanced to be the result of nine totally balanced factions. (With the numbers we have, you might even be able to do the same analysis on tournament WINS when the arc is over and show they’re not random.)
But unfortunately, the numbers don’t say anything about ‘why’. As we saw last arc, Scorpion had a dishonor burn deck sitting around which was a perfectly good deck, but people weren’t playing it until a very good player made the decklist public. Scorpion didn’t suddenly get ‘better’, but the numbers showed an increase in their performance.
Similarly, we have no way of knowing if, for example, Spider numbers are low because their cards are weak, or because all the good players are playing the other clans, or because a strong deck in the environment happens to be a terrible match-up for Spider, and so on.