I find major variances between battles if I simply switch sides with the AI in custom battle. For example, swapping between 100 Imperial Legionary and 100 Sturgian Heavy Spearman, whichever I'm controlling wins(as Arcor vs Ethilde) even if all I do is F1+F3.
that's why testing like that doesn't yield good results. It's a common mistake every single tester makes early on because "it cuts on time" it gives you the results much faster but of questionable utility. Same for those who use RTS to speed things up, it changes the way the game calculates DMG significantly but it's much faster.
You need a "standardized" enemy, a lot of patience, a wide knowledge of the combat AI mechanics and then gather the numbers (Kills and deaths) of the unit that isn't "standard"
Kills are more important when the unit loses while deaths are more important when the unit wins.
the mistake you see a lot when using a standard unit is using the Legionary. A lot of people make the assumption that: If a unit is good against a T5 unit they are automatically good against lower tiers.
That can't be further away from the truth, especially for melee combat. Some troops are miles better against lower tiers than higher ones and vice versa.
To avoid this you have to run tests against the entire spectrum tier 1 to 5, but since 1 battle with 5 different tiers of units will be heavily RNG dependent.
You are kinda forced to run each test, for every tier, separately.
Then you compare the numbers among the troops and you find which one preforms better
But Custom Battles is very reliable when used like that (not when used as a tournament format).
When it comes to tactics Custom Battles defaults the enemy AI to 150 tactics, which is quite common to find in your standard NPC.
Then attacker and defender can change the approach of the AI, generally speaking, tests as far as melee is concerned are run as the player being the defender, so the AI will charge no matter what.
there are mods that you can use to create captain builds (Character reload) and add the effect of party leading perks (Enhanced Battle Test: it requires you to be in a campaign to work).
All you need to do is using character reload to give your character the perks, make sure you are the captain of the formation and profit.
The reason why nobody uses them it's because it generally doesn't have a huge impact on certain units over others of the same class, they all improve in an almost linear manner (so it's almost redundant).
The only significant difference would be Crossbow perks and Bow perks. Crossbow perks are much worse and their units have a much lower potential with a good build in comparison to archers.
Throwing weapons are also like that, with troops that have more projectiles becoming exponentially better when perks are added then those who don't.
Still all of those changes can just be "assumed" as running this kind of tests would lead to mostly predictable results.
This are differences that if you played enough of the game you know work like that from experience.
What really matters at the end of the day is the equipment of the unit, perks will improve their stats but they will also improve them for other units as well.
One thing that works every time is testing with no perks but having a good methodology that doesn't relay on troop X beating troop Y in order for troop X to be classified as better.
But looks if Troops X performed better than Troop Y in the same scenario.
this way you can tell whoever is watching that this is what they will 100% have to expect, then perks will make them better.
Testing with no perks also highlights if a unit needs a certain type of perks instead of another for example.
testing with no build is better than testing with a bad build, a bad build can and will skew the results in favor of a style of unit over the other.
Ending up in making the results even worse.
Then after you find the data you need to figure out what it actually means, and be able to explain why unit X performed better than unit Y.
When something is weird is generally an AI issue, when it's not it could be predicted even before running the tests.
Like how the T4 Falxman is worse than all the other T5 shock troops after the AI was fixed in 1.9.
Poor armor, great weapon, same AI. there is no way they are still as good as they were in 1.8. and in fact they aren't. They regressed in every single category in comparison to their 1.8. performance, still they are more than likely the best T4.
Anyhow all of this to say that Custom Battles is great when used well, a lot of people have stumbled over videos where the feature is used not as well and made a strong opinion on why it's bad for testing units.
Unfortunately a lot of testers, myself included, had, will and are making the same mistakes since the game was first launched. It's part of the growth process to make mistakes.
i invite people to not call names on the above presented video but instead inviting the creator to understand why the methods are considered skewed and such.