Doesn't the custom battle heavily skew results due to difference in AI behavior plus commander perks?
I think you need mods to do more serious forms of testing to remove those variables. I find major variances between battles if I simply switch sides with the AI in custom battle. For example, swapping between 100 Imperial Legionary and 100 Sturgian Heavy Spearman, whichever I'm controlling wins(as Arcor vs Ethilde) even if all I do is F1+F3.