Faction Balancing - Statistical Analysis of WNL 2013 Matches
Since it was argued many times that the factions are not balanced in Warband and that there is a need to improve the balancing, I have conducted a statistical analysis of the WNL matches 2013 to base this argumentation on empirical data. Up to now, the argumentation was rather based on personal feelings and experiences of the players than on valid empirical evidence. However, the opinions differ from each other, thus, I feel that there is a need to have empirical evidence that supports or falsify the single assumptions on the amount and kind of imbalance between factions.
1. Methods
1.1 Leading question
Do different factions have an impact on the match outcome independent of the skill of the teams (players)? Or in other words: Are the factions balanced?
Further assumptions and theoretical background:
The WNL is a kind of measurement for the skill level of the participating teams (players). So teams (players) that have a higher skill should win more often independent of the faction they have to play with. Each faction is equally frequently played by the teams, cause each team plays an equal amount of rounds with both of the available factions during a match. Due to these conditions you can analyse if the factions influence the match outcome over all teams (weak and strong ones) or not. If the factions are balanced, each faction should win 4 rounds in average over all teams and matches, cause each faction is played for 4 rounds during a match by both teams on one map (Note: The matches were divided into two separate sets cause different maps have different conditions and in each WNL match 2 maps are played).
Since different factions have different advantages and disadvantages in the three available troop classes (infantry, archers/crossbowmen, cavalry) and these advantages can be used to a higher or lower amount on open vs. closed maps, it can be assumed that the map type might influence the impact of the factions on the battle outcome as well. Thus, the map type was included in the further analyses to test whether it moderates the faction's impact on the outcome or not.
1.2 Data (sample): Which data were analysed?
The match results in WNL 2013 were confirmed by screenshots, which show how many rounds were won by a certain faction during a match (see WNL Archive, or WNL Week Fixtures). I have included all data that was available until 27th July. I will conduct a reanalysis of the data and include the matches that are still missing as soon as the WNL 2013 will be finished. However, since there are only a few matches that are still missing, the results shown below won't change that much.
Remember, the WNL matches took place before the patch 1.157 was released and even after the release the WNL continued to use the 1.153 settings.
In the spoiler below you find further information on how missing data was dealt with (e.g., no screenshots available, some screenshots missing, default win).
In each WNL match 2 maps are played by both teams. The two maps were treated as separate cases, cause due to the different type of maps the rounds played on open vs. closed maps underlie different conditions.
On each map 8 rounds are played, 4 rounds for each faction. For each data set the following information was gathered from the screenshots:
For the following analyses the variables "rounds won by faction # in trail 1" and "rounds won by faction # in trail 2" were aggregated to the variable "total rounds won by faction #". Furthermore, the variables "Faction 1" and "Faction 2" as well as "total rounds won by faction 1" and "total rounds won by faction 2" were aggregated to the variables "Faction" (including both factions played on one map in a match) and "total number of rounds won", respectively.
In total, we got 558 valid sets of data for the analyses (108 sets for Vaegirs, 114 for Rhodoks, 115 for Swadians, 117 for Sarranids and 104 for Nords, including 280 sets for open and 278 sets for closed maps).
1.3 Statistical Analyses
To test the hypothesis that the factions are not balanced and thus, there are differences in the average number of rounds won between the factions a 2 x 5 univariate analysis of variance was conducted with "map type" (open vs closed) and "faction" (Swadians, Rhodoks, Sarranids, Nords, Vaegirs) as fixed factors and "total number of rounds won" as dependent variable.
An analysis of variance is used to examine whether there are differences in the average score of a variable (called mean) comparing different conditions (called factor levels, e.g., Swadians is one level of the factor "faction" and "open" is one level of the factor "map type"). Differences in the total amount of rounds won comparing the single factions would indicate that the factions are not balanced and have an impact on the battle outcome that is independent of the skill of the teams (players). This effect might be moderated by the different map conditions (open vs. closed), this means the impact might be larger or smaller on different map types.
2. Results
2.1 Descriptive Statistics
As you can see in the table above the factions differ in average from the expected average number of rounds won of 4.0 in the different conditions (open vs closed maps and total = over both map types). This suggests, that the factions have an impact on the match outcome independent of the skill of the participating teams (players). However, the difference is quite small, what suggests that the battle outcome is mainly based on the skill of the teams and not the faction choice.
The following inferential statistical analysis will examine with which probability our assumption that the teams are imbalance is true for the popoulation (this means the observed differences in this sample applies to all Warband matches ever played) as well.
2.2 Faction Balancing
2.2.1 Test of between-subjects effects (or are there significant differences between the factions, between the map types and concerning the interaction of map type and faction in the sample of data?)
The analysis of variance revealed that factions have an significant impact on the battle outcome. The probability of error is 0.001, this means we can be 99,9% sure that the observed differences in the average total number of rounds won are not based on randomness and can be found in the population as well. The level of significance is usually at least set to 0.05 in scientific research. Thus, to prove that an assumption is valid you must be at least 95.0% sure that your observed effect in the sample is not based on randomness. If the probability of error is smaller than 0.05 than you have validated your assumption based on empirical evidence.
In contrast, the map type neither has a main effect on the battle outcome nor it moderates the impact of the factions on the battle outcome (statistical interaction faction x map type). Both effects (map type, faction x map type) don't reach the level of statistical significance (<0.05). However, the probability of error in case of the interaction is not that high, so we have a kind of tendency for a moderating effect of the map type on the impact that factions have on battle outcome. This means if we include more data in the analyses the effect might become significant but with the current data we just can't be sure enough that this assumption is true.
Now you might wonder, how large is this impact that the factions have on the match outcome? The analyses showed that the impact is quite small (partial eta squared = .034). In plain terms only 3,4% of the outcome of rounds is rather affected by the faction that is played than by the skill of the teams (players). Luckily this strongly suggests that in fact the factions overall are/were balanced in Warband (before the patch 1.157).
2.2.2 Test of within-subjects effects (or which factions differ significantly from each other in the total number of rounds won?)
Note: Although the figure shows that there are differences in the total average number of rounds won, this does not mean that each small difference can be observed in the population (all matches ever played in Warband) as well. Most of the differences are very small (e.g., difference between Nords and Vaegirs on both types of maps), thus we need to check with the statistical analyses if this differences are significant and if we can be at least sure with 95% probability (or a probability of error (p) below the level of 0.05).
The analyses revealed that in average Vaegirs significantly won more rounds than Rhodoks (p < .01), Swadians (p < .01) and Sarranids (p < .05). They don't differ significantly in the number of rounds won compared to Nords. Nords significantly won more rounds than Rhodoks (p < .01) and Swadians (p < .05). No other significant differences in the total number of rounds won could be found in the comparison of the single factions.
In plain terms, Veagirs and Nords are stronger than the other factions, but they don't differ in strength from each other. Swadians and Rhodoks are the weakest factions and Sarranids belongs to the weaker factions as well.
If we have a closer look at the comparison of open vs. closed maps, we can see in Figure 1, that Swadians and Rhodoks show an equal performance on both types of maps, Vaegirs and Nords are both slightly better on closed maps and Sarranids show a better performance on open than on closed maps. Since the statistical analyses revealed that the interaction between faction and map type wasn't significant, we have to assume that the differences between the factions on a certain type of map is based on randomness. So further comparisons between the factions can't be applied.
3. Discussion and Conclusions
In summary, the analyses revealed that there is an imbalance between the factions. Vaegirs and Nords significantly are more likely to win independent of the skill of the team (players) that is playing with this faction. However, the impact of the factions on the outcome of the battle is rather small, only 3,4% of the outcome is affected by the factions and not by the players' skills. Therefore, the factions are/were balanced in Warband (v1.153).
Conclusions concerning the balancing after the changes of Warband patch v1.157
To improve the balancing between the faction the patch v1.157 mainly focus on the following aspects:
If we want to predict if the patch v1.157 might provide a better balancing for the factions, we have to find and discuss potential explanations for the imbalance that was proved by the empirical data.
The stats of the different classes for each faction that determine fighting performance show that the observed differences in the total number of rounds won are reflected by the amount of skill points of each faction (for reference see this thread or have a look at module_troops.py in the Warband module system of version v1.153). Beside the skill points, proficiencies and equipment might have an influence on the performance of the factions as well.
The different distribution of total skill points and proficiencies among the faction suggests that approaches to improve the balancing between the factions should focus on a more equal distribution. If we have a look at the aproaches made by the patch, we could only find a change in skill points that affects all factions. Therefore, this approach will unlikely improve the balancing between the factions. In plain terms, the empirical data suggests that the archer nerf doesn't improve the balancing between factions. Moreover, since Swadians and Rhodoks that have crossbowmen already show a lower performance than factions with bow archers, the archer nerf might even increase the imbalance between the factions.
However, skill points and proficiencies are not that highly associated with the total number of rounds won (correlation r <.30), that they could explain the observed differences in total. It is more likely that the observed differences are based on the different equipment that is available for each faction. The improved accessibility to better armour for Swadians might help to improve their performance in matches. However, Rhodoks and Sarranids don't profit from these changes and have a similar performance as Swadians. The Vaegirs that are beside the Nords the strongest faction, might show a slightly less good performance if their archers don't have access to a scimitar anymore. However, if we take into account that Sarranid archers have access to a scimitar as well and don't show a good performance especially on closed maps, we have to assume that rather the equipment and skills of the Vaegirs infantry than the melee skills and equipment of the Vaegir archers leads to the observed strength of the faction.
Furthermore, Nords and Rhodoks are not affected by the approaches of the patch concering balancing. And since Rhodoks are one of the weaker factions compared to Nords (a faction that is as strong as Vaegirs are) this aspect of imbalance is not resolved by the patch.
Taken together, the patch will more likely bias the imbalance that exists between the factions than provide a better balance. And if we take into consideration that the imbalance is overall quite small, applying the changes is not recommendable by any account cause you will likely get another kind of imbalance and you are even at risk to increase the current imbalance .
Edit: Skill points and proficiencies were corrected in the spoiler of the discussion.
Since it was argued many times that the factions are not balanced in Warband and that there is a need to improve the balancing, I have conducted a statistical analysis of the WNL matches 2013 to base this argumentation on empirical data. Up to now, the argumentation was rather based on personal feelings and experiences of the players than on valid empirical evidence. However, the opinions differ from each other, thus, I feel that there is a need to have empirical evidence that supports or falsify the single assumptions on the amount and kind of imbalance between factions.
1. Methods
1.1 Leading question
Do different factions have an impact on the match outcome independent of the skill of the teams (players)? Or in other words: Are the factions balanced?
Further assumptions and theoretical background:
The WNL is a kind of measurement for the skill level of the participating teams (players). So teams (players) that have a higher skill should win more often independent of the faction they have to play with. Each faction is equally frequently played by the teams, cause each team plays an equal amount of rounds with both of the available factions during a match. Due to these conditions you can analyse if the factions influence the match outcome over all teams (weak and strong ones) or not. If the factions are balanced, each faction should win 4 rounds in average over all teams and matches, cause each faction is played for 4 rounds during a match by both teams on one map (Note: The matches were divided into two separate sets cause different maps have different conditions and in each WNL match 2 maps are played).
Since different factions have different advantages and disadvantages in the three available troop classes (infantry, archers/crossbowmen, cavalry) and these advantages can be used to a higher or lower amount on open vs. closed maps, it can be assumed that the map type might influence the impact of the factions on the battle outcome as well. Thus, the map type was included in the further analyses to test whether it moderates the faction's impact on the outcome or not.
Usually descriptive statistics are used to describe quantitatively the main features of a collection of data. Descriptive statistics are distinguished from inferential statistics (which were used in this analysis as well). Inferential statistics use the data to learn about the population that the sample of data is thought to represent. Or in other words, if you have an assumption or hypothesis about an effect or difference (in our case: the factions are imbalanced) in a population (in our case: all matches ever played in Warband), then you can use inferential statistics to test whether your assumption applies to the population based on a sample of data. Usually it is not possible to analyse the population, in our case we just don't have the results of every match that was ever played in Warband. So we need to test our assumption based on a random sample of data (in our case the WNL 2013 match results). However, if you observe a difference between variables in such a random sample of data that is representative for the population you want to examine, then the observed differences can be based on randomness and are not a valid difference that exists in the population. The inferential statistics prove with which probability a difference or effect can be found in the population as well or in other words, what is the probability of error, this means with which probability you would be wrong with your assumption that the observed difference or effect in the data is not based on randomness.
1.2 Data (sample): Which data were analysed?
The match results in WNL 2013 were confirmed by screenshots, which show how many rounds were won by a certain faction during a match (see WNL Archive, or WNL Week Fixtures). I have included all data that was available until 27th July. I will conduct a reanalysis of the data and include the matches that are still missing as soon as the WNL 2013 will be finished. However, since there are only a few matches that are still missing, the results shown below won't change that much.
Remember, the WNL matches took place before the patch 1.157 was released and even after the release the WNL continued to use the 1.153 settings.
In the spoiler below you find further information on how missing data was dealt with (e.g., no screenshots available, some screenshots missing, default win).
- if one screenshot was missing, missing data was calculated based on the total match outcome and the available 3 other screenshots
- if more than one screenshot was missing, only the available screenshots were included in the analyses
- default wins were not included in the analyses
On each map 8 rounds are played, 4 rounds for each faction. For each data set the following information was gathered from the screenshots:
- WNL week (1-12)
- Match # (a distinct number to identify the single matches, cause the 2 maps of one match were treated as two cases)
- Faction 1 (Swadians, Rhodoks, Sarranids, Nords, Vaegirs)
- Faction 2 (Swadians, Rhodoks, Sarranids, Nords, Vaegirs)
- Map name (San'di'boush, Frosty Battle, Field by the River, Fort of Honour, Vendetta, Dry Valley, Mountain Fortress, Nord Town, Shariz Village, Ruins, Reveran Village, Verloren, Khudan Outskirts, Castle Ruins, Port Azur)
- Map type (open vs. closed)
- rounds won by faction 1 in trail 1
- rounds won by faction 2 in trail 1
- rounds won by faction 1 in trail 2
- rounds won by faction 2 in trail 2
- draws in trail 1
- draws in trail 2
For the following analyses the variables "rounds won by faction # in trail 1" and "rounds won by faction # in trail 2" were aggregated to the variable "total rounds won by faction #". Furthermore, the variables "Faction 1" and "Faction 2" as well as "total rounds won by faction 1" and "total rounds won by faction 2" were aggregated to the variables "Faction" (including both factions played on one map in a match) and "total number of rounds won", respectively.
In total, we got 558 valid sets of data for the analyses (108 sets for Vaegirs, 114 for Rhodoks, 115 for Swadians, 117 for Sarranids and 104 for Nords, including 280 sets for open and 278 sets for closed maps).
1.3 Statistical Analyses
To test the hypothesis that the factions are not balanced and thus, there are differences in the average number of rounds won between the factions a 2 x 5 univariate analysis of variance was conducted with "map type" (open vs closed) and "faction" (Swadians, Rhodoks, Sarranids, Nords, Vaegirs) as fixed factors and "total number of rounds won" as dependent variable.
An analysis of variance is used to examine whether there are differences in the average score of a variable (called mean) comparing different conditions (called factor levels, e.g., Swadians is one level of the factor "faction" and "open" is one level of the factor "map type"). Differences in the total amount of rounds won comparing the single factions would indicate that the factions are not balanced and have an impact on the battle outcome that is independent of the skill of the teams (players). This effect might be moderated by the different map conditions (open vs. closed), this means the impact might be larger or smaller on different map types.
2. Results
2.1 Descriptive Statistics
Table legend: Mean = arithmetic average of "total number of rounds won" by each faction; Std. Deviation = Standard deviation shows how much variation or dispersion exists from the average (mean); N = number of cases (data sets) in a certain condition (factor level)
The following inferential statistical analysis will examine with which probability our assumption that the teams are imbalance is true for the popoulation (this means the observed differences in this sample applies to all Warband matches ever played) as well.
2.2 Faction Balancing
2.2.1 Test of between-subjects effects (or are there significant differences between the factions, between the map types and concerning the interaction of map type and faction in the sample of data?)
Table legend: Type III Sum of Squares = Type of variance that is used by the statistical analysis to detect differences in means (arithmetic averages); df = degree of freedom, another testing quantity used in the analyses; Mean Square = squared average means used in the analysis of variance; F = outcome testing quantity for the probability of error in an analysis of variance; Significance = probability of error, percentage of error you would get if you assume that the observed differences can be found in the population
In contrast, the map type neither has a main effect on the battle outcome nor it moderates the impact of the factions on the battle outcome (statistical interaction faction x map type). Both effects (map type, faction x map type) don't reach the level of statistical significance (<0.05). However, the probability of error in case of the interaction is not that high, so we have a kind of tendency for a moderating effect of the map type on the impact that factions have on battle outcome. This means if we include more data in the analyses the effect might become significant but with the current data we just can't be sure enough that this assumption is true.
Now you might wonder, how large is this impact that the factions have on the match outcome? The analyses showed that the impact is quite small (partial eta squared = .034). In plain terms only 3,4% of the outcome of rounds is rather affected by the faction that is played than by the skill of the teams (players). Luckily this strongly suggests that in fact the factions overall are/were balanced in Warband (before the patch 1.157).
2.2.2 Test of within-subjects effects (or which factions differ significantly from each other in the total number of rounds won?)
Note: Although the figure shows that there are differences in the total average number of rounds won, this does not mean that each small difference can be observed in the population (all matches ever played in Warband) as well. Most of the differences are very small (e.g., difference between Nords and Vaegirs on both types of maps), thus we need to check with the statistical analyses if this differences are significant and if we can be at least sure with 95% probability (or a probability of error (p) below the level of 0.05).
The analyses revealed that in average Vaegirs significantly won more rounds than Rhodoks (p < .01), Swadians (p < .01) and Sarranids (p < .05). They don't differ significantly in the number of rounds won compared to Nords. Nords significantly won more rounds than Rhodoks (p < .01) and Swadians (p < .05). No other significant differences in the total number of rounds won could be found in the comparison of the single factions.
In plain terms, Veagirs and Nords are stronger than the other factions, but they don't differ in strength from each other. Swadians and Rhodoks are the weakest factions and Sarranids belongs to the weaker factions as well.
If we have a closer look at the comparison of open vs. closed maps, we can see in Figure 1, that Swadians and Rhodoks show an equal performance on both types of maps, Vaegirs and Nords are both slightly better on closed maps and Sarranids show a better performance on open than on closed maps. Since the statistical analyses revealed that the interaction between faction and map type wasn't significant, we have to assume that the differences between the factions on a certain type of map is based on randomness. So further comparisons between the factions can't be applied.
3. Discussion and Conclusions
In summary, the analyses revealed that there is an imbalance between the factions. Vaegirs and Nords significantly are more likely to win independent of the skill of the team (players) that is playing with this faction. However, the impact of the factions on the outcome of the battle is rather small, only 3,4% of the outcome is affected by the factions and not by the players' skills. Therefore, the factions are/were balanced in Warband (v1.153).
Conclusions concerning the balancing after the changes of Warband patch v1.157
To improve the balancing between the faction the patch v1.157 mainly focus on the following aspects:
- reducing the archer athletics skill by 1 point for all faction
- reducing the price of several equipment mainly for Swadians
- Vaegir archers don't have access to a scimitar anymore
If we want to predict if the patch v1.157 might provide a better balancing for the factions, we have to find and discuss potential explanations for the imbalance that was proved by the empirical data.
The stats of the different classes for each faction that determine fighting performance show that the observed differences in the total number of rounds won are reflected by the amount of skill points of each faction (for reference see this thread or have a look at module_troops.py in the Warband module system of version v1.153). Beside the skill points, proficiencies and equipment might have an influence on the performance of the factions as well.
Total amounts of proficiency and skill points:
Swadians: 1515 | 55
Vaegirs: 1660 | 49
Nords: 1775 | 58
Rhodoks: 1770 | 48
Sarranids: 1600 | 49
Swadians: 1515 | 55
Vaegirs: 1660 | 49
Nords: 1775 | 58
Rhodoks: 1770 | 48
Sarranids: 1600 | 49
However, skill points and proficiencies are not that highly associated with the total number of rounds won (correlation r <.30), that they could explain the observed differences in total. It is more likely that the observed differences are based on the different equipment that is available for each faction. The improved accessibility to better armour for Swadians might help to improve their performance in matches. However, Rhodoks and Sarranids don't profit from these changes and have a similar performance as Swadians. The Vaegirs that are beside the Nords the strongest faction, might show a slightly less good performance if their archers don't have access to a scimitar anymore. However, if we take into account that Sarranid archers have access to a scimitar as well and don't show a good performance especially on closed maps, we have to assume that rather the equipment and skills of the Vaegirs infantry than the melee skills and equipment of the Vaegir archers leads to the observed strength of the faction.
Furthermore, Nords and Rhodoks are not affected by the approaches of the patch concering balancing. And since Rhodoks are one of the weaker factions compared to Nords (a faction that is as strong as Vaegirs are) this aspect of imbalance is not resolved by the patch.
Taken together, the patch will more likely bias the imbalance that exists between the factions than provide a better balance. And if we take into consideration that the imbalance is overall quite small, applying the changes is not recommendable by any account cause you will likely get another kind of imbalance and you are even at risk to increase the current imbalance .
Edit: Skill points and proficiencies were corrected in the spoiler of the discussion.