Balance in the System, Part I

Greetings Rokugani!

I’m back after a long hiatus! My semester is finally done, and my grades have been turned in, so I can once again turn my attention back to my new-found L5R hobby. When I started going through my compiled Kotei results (which you can find here), I really couldn’t fail to notice the many successes of the Mantis Clan – for a while they really seemed like there were smashing all comers with no end in sight! The percentage of Koteis won by Mantis decks blew past the 40% mark, and shot up to the mid 40’s before tapering off to their present “measly” 33.3% (12 wins out of 36 events). Looking at figures like this, I started to wonder if the current iteration of the game we all play and enjoy is really balanced.

And clearly I’m not the only one considering this question. Indeed, the AEG forums are replete with grumblings about the success and power of the Mantis (although many disgruntled individuals seem to have quieted down in the wake of the announcement that Mantis won’t be getting a stronghold in Onyx edition). Some proclaim that the combination of free gold when going second, aggressively priced personalities, and reliable card draw are just too good. And others protest that while there is a fine line between “good” and “broken” that line hasn’t been crossed yet. There have been calls for an errata on the Mantis Stronghold and/or Shika Sensei, banning of certain Mantis personalities, and a host of other well-intentioned suggestions. All in the name of “bringing balance to the game.” And that brings me to today’s topic – balance, and specifically if the game is balanced as it currently is.

[I should interject a warning here. This article became WAY longer than it was intended to be. Indeed, it’s been split into two articles, but even so, this first piece is long and occasionally pedantic. It includes an in-depth discussion about what balance is, what it should be, how we can/should measure it, and how we can infer it from the limited data set we actually have. If you aren’t interested in reading all of that, simply skip down to Section 3. Go ahead, I won’t judge you… much. 🙂 ]

Section 1 – What is Balance?

Being scientifically minded and trained, I knew that the first thing I would need to do is figure out a way to measure the balance of the game, and in order to do that, I’d have to carefully define balance as it applies to L5R. Although simple-sounding, this is actually a rather challenging task. And the difficulty lies in the fact that we have an innate sense of fair play. It’s biologically and/or socially programmed into us, and we certainly aren’t the only species with this sense of fairness. If you give unequal rewards to two different brown Capuchin monkeys who performed the same task, for instance, the monkey who received the lesser reward will refuse to perform the task again, even though it means he will get nothing. The situation intensifies if one monkey is given a minor reward for a task, while a second is given a major reward for no effort – the situation may even turn violent! (If you want to know more about this study – go here.)

Even this guy knows what fairness is!
Even this guy knows what fairness is – I’m sure that figuring out balance for L5R won’t be that difficult!

While this sense of fairness and equity is very keen, it’s based on the idea that we “know fairness when we see it.” To put it another way, our evaluation of equality and fairness is based on our perceptions which we analyze with our intuition, and we allow both of these to be colored by our emotions. Unfortunately for us, these three things – perceptions, intuition, and emotion – are easily swayed by insubstantial or irrelevant factors. In order to avoid these sorts of biases, we will need to construct a careful and measurable definition of fairness in L5R.

The first type is what I’ll call “Card Balance” – which is nothing more than a straight forward evaluation of two cards side by side. Consider Matsu Tayuko and Yoritomo Yakuwa. Both are personalities that cost 3 gold. Both have honor requirements below the starting family honor for their respective factions. The differences are that Yoritomo Yakuwa always has 3 Force (as opposed to only while attacking), has 1 fewer Chi, has a personal honor of 0 (as opposed to 3), and he is a Scout and has Naval. This means that unless you are planning to utilize personal honor, Yoritomo Yakuwa is likely to be the superior card.

Matsu

Yoritomo

But Card Balance isn’t really what L5R players are thinking of when they talk about this Clan or that Clan dominating the tournament scene. They are talking about Faction Balance – a concept that says that all factions in a game should be equally competitive against each other. But what exactly does that mean and how do we measure it? One possible definition of Faction Balance might be that each faction would win an equal number of games. But this quickly runs into problems – what happens if there are twice as many Lion players as compared to Crab players? Should the Lion lose more often simply because there is a greater representation of the Lion Clan? I think most players would agree that this concept isn’t the balance that is desired.

So what if we say that each faction should win games proportional to that faction’s representation (meaning that factions with equal representation should win an equal number of games)? Well, assume that Mantis and Phoenix are equally represented – this means that when any Mantis player sits down across from any Phoenix player, there should be a completely equal percentage of either player winning. The game effectively just becomes a coin toss. Clearly, this definition removes advantages in player skill and deck construction, and I think that most players would like these to be key factors in deciding who wins and who loses.

How about: players of equal skill playing the best deck available for their faction should win games equally as often. This may be the best definition of balance yet, but it assumes that each faction has a single deck that is optimal in every given situation. That gets us into all sorts of problems, including that even if there is a “best” deck for a faction, not all the players will want to play it. Just consider that the game possesses four different victory conditions, meaning that different players within a single faction may want to pursue different victory objectives. The Crab Clan is a great example – although their main victory condition is clearly military, there exist a sizable group who want to be able to dishonor their opponent with their Yasuki courtiers, and others who would like to demonstrate how honorable they are with the Kaiu family and their superb fortifications. That’s three fairly popular victory conditions for a single clan, and that means that if you want to make the Crab players happy, you need to provide at least three equally optimal decks – one for each victory condition! (Note that this doesn’t even take into account different paths to the same victory condition! Clearly a Kuni shugenja military deck will be very different from a Hida Beserker military deck – do all these need to be balanced in addition to the different factions?)

But there is a far larger problem with this definition: it’s no longer measurable. Or at least not in the real world, and certainly it’s beyond my ability to measure from the confines of my living room. We don’t have data from matches that feature players of equal skill playing with the best decks available to their faction. We have Kotei results, which feature players of widely differing levels of skill playing with decks of widely different composition. To make matters worse, the deck that wins an event is certainly not the most optimal deck that can be built by that faction, but is instead the deck that was best tuned to beat the 6-8 other decks that it faced during the event. Consider a deck that faces nothing but rocket honor decks, and manages to beat them all. Clearly that deck is well tuned to beat honor decks, but it is unclear how well that deck would have stood up against military or dishonor decks.

Section 2 – Inferring Balance from Data

Huh – what were the odds?

So maybe I had it all backwards. Maybe we shouldn’t try to carefully define our ideal of faction balance, and then figure out a way to measure it. Maybe we should look at the data we have, and see how we can use this to infer faction balance. The first way to do this might be to simply look at the number of wins that each clan has during a Kotei season (and now we’re right back to where we started – we’ve come full circle, and still haven’t gotten anywhere! But stay tuned, we’re starting to zero in on this issue!).

The problem with simply looking at the number of wins that each faction has collected is one of sample size. We’ve only had 33 events, and we want to measure balance between nine different factions, meaning that in a perfect world, each clan would have won 3.67 events. That’s not very many data points. Let me put it another way. If we take a perfectly weighted six-sided die, we know that it should land on each face an equal number of times, but if we only roll it 6 times, the odds that we will actually roll a single result of all 6 possibilities is only ~1.5% (1 * 5/6 * 4/6 * 3/6 * 2/6 * 1/6)! This means that we need to be dealing with a large sample size, and there just aren’t enough L5R tournaments in this or any Kotei season to be able to use those numbers.

The only other data set we have is how often a faction is able to make it past the Swiss rounds and into the Elimination rounds (making it “past the cut”). Thus far in the 2015 Spring Kotei season, 227 decks have made it past the cut. That means that each clan could expect ~25 decks to have made it past the cut. That’s a dramatic change in our sample size, and it gets us to a large enough data set that we can start having some confidence in our conclusions.

But we can’t just look at how many decks from each clan made it past the cut – this fails to take into account attendance (as we discussed earlier). In a perfectly balanced world, the makeup of the post-cut environment would look the same as the pre-cut environment. So instead, I’m going to look at a ratio of “the percentage of Clan X in the post-cut group” divided by “the percentage of Clan X in the attendance.” So that I don’t have to keep writing out that whole thing, I’ll refer to this odds ratio as the chance of making the cut. In our dream scenario above, each clan’s ratio would be 1 – they make it past the cut in the same percentage as they entered the tournament. The further from 1 a clan goes (either up or down), the less balanced it is. Numbers below 1 would indicate that the clan has a hard time getting past the cut (a “weak” clan), and numbers above 1 indicate that the clan is making the cut more often than they should (a “strong” clan).

Section 3 – Report Card

OK! Now we’re getting someplace! Let’s look at those numbers! Well, there is one more step we have to do before we do that. We have to think about how to interpret the numbers we get. And this is tricky, because math can’t tell us how perfect the game has to be in order to be given the “BALANCED” stamp of approval. Ultimately we each have our own tolerance levels for what we consider to be “appropriately” balanced, and what is unacceptable. But with all that said, allow me to propose a few thresholds that might allow us to distinguish between “balanced” and “not balanced” factions. I’ll also take this opportunity to assign a point score to each level, allowing us to measure the average balance of the factions in L5R.

1) Any clan within 10% of the ideal (0.9-1.1) is extremely well-balanced, and I would expect that much of the variation from the ideal is due to chance, rather than balance issues. Ideally, for me, this is where all clans would be. I will award the Design Team 2 points for each clan that makes it into this category.

2) Any clan 11 – 20% of the ideal (0.8-0.9 and 1.1-1.2) is acceptably balanced. Chance and card power play nearly equal roles in the variation from the ideal. Reasonably, due to the difficulties in balancing so many factions, this is where all clans would be. I will award the Design Team 1 point for each clan that makes it into this category.

3) Any clan 21 – 35% of the ideal (0.65-0.8 and 1.2-1.35) is not balanced. Card power plays a significant role in the variation from the ideal experienced by these clans. While I may not be happy about this category, I do realize that clans will occasionally slide into it. The first clan that slips to this category will net the Design Team 0 points, but for each subsequent clan, they will lose 1 point.

4) Any clan that is outside of 35% from the ideal is a problem – it either represents a single clan that is wins so often they are format-defining, or a clan that is so hampered by their poor card pool that they are effectively unable to compete. As this category represents a major detriment to the game, I will subtract 3 points from the Design Team for each of these clans.

Always with the complications!

This point scoring means that the Design Team could theoretically score 18 points (2 for each clan), but that would truly be a miracle, and I certainly don’t expect them to put every clan in the bulls-eye, as it were. A more reasonable goal, I think is to achieve 9 points, which would mean that every clan (on average) fits into the “acceptably balanced” category. So with that in mind, I score how well the Design Team has done out of a high score of 9 points. It’s worth noting here that my point system is designed to reward careful balance, is lenient to mild imbalance, and very harshly penalizes format-defining or unplayable factions. But this point system is clearly biased towards what I’m looking for in a game and what I find acceptable. You may need to play with the thresholds and the point scores to make things fit your biases.

I also need to briefly discuss the data set itself. The problem with the composite 2015 Spring Kotei data is that it includes tournaments from five very different formats (Ivory Strict / Arc and 20 Festivals Strict / Arc / Extended). How appropriate is it to lump all of these data together? Honestly, it’s not ideal. However, my original stated goals were 1) to measure how balanced the game currently is and 2) to compare that to previous iterations of the game. That means that I will need to apply the metric I just discussed above to previous year’s Koteis. And the data sets I have to work from do not include information about which tournament was played with what format. Even if I was to look up all of this information (which I could certainly do), I’d also have to contend with tournaments before and after the errata or banning of important cards. Consider the 2014 Kotei Season – I think it is pretty safe to say that the Ivory legal events before the erratas of the Crane Stronghold and Akagi Sensei were very different from the events that came after! Should those be considered as different data sets? The fact is that it is very difficult to decide what tournaments are the “same” environment and which are “different.” This is especially true because I don’t even have personal experience to fall back on (I’m new enough that I didn’t play in those environments!). Given all of these complications, I will consider all of 2015 to be a single data set (as I will consider all of 2014, and all of 2013, and so on), but I will also just look at the 20 Festivals Arc events to see how they stack up to previous years. Just keep in mind that it isn’t entirely fair to compare the 20 Festivals Arc data to the composite data from an entire year.

Ok, enough yammering! Let’s finally look at the data for the 2015 Spring Kotei Season!

Combined 2015 Spring Kotei Results:

Clan: Attendance % Make Cut % Odds Ratio Points
Crab 113 9.23 22 8.98 0.97 +2
Crane 139 11.36 29 11.84 1.04 +2
Dragon 139 11.36 25 10.20 0.90 +2
Lion 138 11.27 25 10.20 0.91 +2
Mantis 147 12.01 42 17.14 1.43 -3
Phoenix 139 11.36 31 12.65 1.11 +1
Scorpion 123 10.05 23 9.39 0.93 +2
Spider 153 12.50 20 8.16 0.65 0
Unicorn 133 10.87 28 11.43 1.05 +2
Total: 1224 100 245 100 1.00 10

Looking at these numbers, we can see that 6 clans – Crab, Crane, Dragon, Lion, Scorpion, and Unicorn all make it into the highest category of balance! The Design Team gets awarded 12 points! Phoenix makes it into the category of acceptable balance, granting the design team another point. The Spider rest alone in the third category of balance, indicating that their card pool is moderately underpowered, but this costs the design team nothing. And then finally we get to Mantis, who alone are in the unacceptable category, causing the Design Team to lose 3 points.

That brings the score of the Design Team to 10 out of 18 points for the 2015 Season! (Plus, note that 6 out of 9 factions are rated as “Extremely Balanced”) By this metric, the 2015 card pool is extremely well balanced, and I think the design team deserves an A. Note that the season isn’t over yet, and several clans are on or near the thresholds that I’ve set up. It will be interesting to see if Dragon can hold onto their rating. Phoenix has a real chance of falling back into the extreme balanced category. And most frightening, Spider has a real danger of slipping down into the unacceptably weak category, which would significantly impact the total balance score.

But what about the reason I started writing this mammoth document? That 33.3% win percentage by the Mantis clan? Well, here again we come to the dangers of small sample size. If the Mantis victories were proportional to how often they get past the cut (keep in mind that they make it past the cut much more often than they should!), they would have won ~6 events by now. So they’ve won twice as many events as expected, but if you rolled a d6 twelve times, would you really be that surprised if you rolled a single result four times instead of two? Now that all having been said, the Mantis clan is inside my “unacceptably unbalanced” category, and does clearly represent a failure of the Design Team. Whether they are so “unacceptably unbalanced” as to require errata or bannings is for wiser and more experienced heads than mine. But the point is that the number of Mantis wins provokes an emotional response in us which colors our perceptions of the field in general. If we want to examine how balanced things really are, we have to get past that initial emotional reaction and really ask ourselves which conclusions are reasonable to draw from the number of Mantis victories, and which are not.

Stay tuned for the next article where I apply this metric of balance to the 2011 – 2014 Kotei seasons to see how this year stacks up to previous years. I’ll also investigate whether the Twenty Festivals Arc format is as balanced as the composite 2015 Spring Kotei data would suggest. All that, and more, next time!

Until then, enjoy your causal games, go to your local Kotei (attendance is on the rise, but let’s push it even more!), and enjoy all the wonderful inter-clan conflict being stirred up the this Kotei Season!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.