CrazyElf
Sergeant
This is a really technical post and I'm really not sure where to put this, so moderators, please move this if it's not in the right spot - please accept my apologies in advance.
I'm wondering what the best core count would be for Bannerlord. I'm building a new PC and I'd like to share this because I suspect I'm not the only one with this issue and in the coming years, even if a higher core count CPU is out of your budget today, it will become much cheaper someday and that by the time say, a hypothetical Mount and Blade 3 comes out in 2030, these high core counts will be mainstream. This game is going to be with us for a while and it could be easily a decade before M&B3 comes out. The emergence of Zen in particular in 2017 has propelled 8 cores to be mainstream.
In my case, I'm currently planning to build a system with the recent announced Zen 3 CPU cores by AMD. I'm debating between the 16 core or the 32 core. Bannerlord of course is not the only application I am considering (I do have a few other applications that I am looking at), but I'd be very interested in the engine team developer's opinion's here. Likewise, when you are buying parts for your PC, make sure you consider everything you intend to use that PC for, not just Bannerlord.
The dilemma is very simple. While Bannerlord does use many cores (a huge improvement and big thanks to the engine team for this), not everything is multithreaded (ex: the application is not embarrassingly parallel). The more cores that are added, the slower the average clockspeed becomes.
This is a very long thread and I want to apologize for the length, but I figure, this is a passion of mine, so why not. I just like making threads like this for fun, as right now I have some time on my hands. Let's just say that this is not the only application I'm doing research on.
What information I have found from doing my research
I've extensively done some research and here is what we I have found so far:
So Moustafa believes that the 64 core is probably slower than the 32 core. I agree with him, and you'll see my calculations later in this post.
Now here is another quote from Murat Türe, who also works on the :
So this is hugely valuable.
Core scaling is bottlenecked by something called Amdahl's Law. Let's assume for the sake of argument that 75% of the data could be multi-threaded. Keep in mind that the post about 60-70% being multihreaded was from 2017 - I'm assuming 75% today assuming some modest improvements.
A bit more reading: https://www.pugetsystems.com/labs/articles/Estimating-CPU-Performance-using-Amdahls-Law-619/
So we can see that the CPU is already facing diminishing returns after 8 cores. Now here is the formula that we care about.
So in this case:
Assuming 16, 32, and 64 cores @ 75% threading, we get:
Scaling of about:
Now let's look at the existing core count vs clockspeed trade-offs
Let's take a look at 3 reviews of Zen 2:
I've filtered the important parts, but Gamer's Nexus is probably the best reviewer for computer parts on Youtube and not just for gamers.
So it looks like that there is a small average core clockspeed trade-off going from 16 cores to 32 cores, but a bigger loss in single threaded performance going from 32 cores to 64 cores (at least to allow the CPUs to remain within a reasonable TDP). One other consideration is that the Windows scheduler can only deal with 64 threads right now (as of 2020). LInux does not have this limitation.
For further reading: https://www.anandtech.com/show/15483/amd-threadripper-3990x-review/3
Let's take the scaling expected from multiple cores then multiply that by the average of cores, assuming 75% can be made parallel:
So the 32 core is optimal in this case. Clearly, Moustafa's claim that the 64 core would be slower is accurate due to the lower clockspeeds per core.
The 32 core and 16 core though is close. From a budget standpoint, it makes little sense to go 32 cores, unless of course you have other applications that need the extra cores (I do in my case, as I have some applications that I run outside of playing this game that are closer to being "embarrassingly parallel").
It is possible now that Zen 3 has been announced (and perhaps by the time you read this, future CPUs that are even more efficient are out), that future CPUs will allow for no trade-offs with more cores (Ex: the cores can go to the limits of silicon or whatever replaces silicon) will allow between the jump from 16 to 32 cores with no loss in average performance per CPU core clockspeed.
That being said, we are running into diminishing returns and most people should buy the 16 or even the 8 core. The 32 core is actually slower at many games, if you watch the GN reviews in full.
Without a doubt, AMD has binned the top dies for the 5950X in the silicon lottery (for those who do not know that term is referring to during manufacturing, that some chips are better than others). I would imagine that past the 5950X, an even higher bin is also going into Threadripper and EPYC, which are their higher core count CPUs.
Considering the jump from 16 to 32 cores had only a small penalty on single threaded average performance, I would imagine that CPUs are efficient enough now that the jump from 8 core to 16 core CPUs would not have much of a penalty in average per core clockspeed across all cores. In fact, for Zen 3, the jump from 16 core to 32 cores might have no penalty, simply because the 32 cores are even higher binned than the 16 cores, and because of technological advancements. AMD for Zen 3 actually claims a 24% performance per watt improvement over Zen 2.
Single threaded performance also matters
So it's a win for the 32 core right? Not necessarily. Not everything in Bannerlord (or any other game). More core also means a slower top core boost.
It's likely though that the 16 core can boost higher than the 32 or 64 core for the next generation, although that's not certain right now, especially as Zen 3 is more efficient.
What about the smaller core counts? Well, they actually have a smaller core single core boost speed in Zen 3.
So where single threaded performance is the bottleneck, the 16 core 5950X is the best choice. Like Zen 2 before it, the 64 core Zen 3 Threadripper will be slower in single core boost, although due to advancements and binning, it is possible that the 32 core 5970X might have the same top boost as the 16 core 5950X.
Still, the possibility that the 16 core might have a higher single threaded boost clockspeed is a good reason to buy the 16 core, even if money were no object, at least for Zen 2 circa 2020. At time of this writing, we do not have newer CPUs, so by the time you read this, it may change.
In summary, the 16 core has slightly worse multi-threaded scaling, but may have a better top boost. This may someday change as CPU cores get more efficient, but that is the case circa 2020. I personally think that given these trade-offs, the 16 core is probably the best choice right now, although we still need to see the top clocks for the 32 core Zen 3 boost.
Questions for the developers
@Murat Türe @Mustafa Korkmaz and anyone else on the Taleworlds engine team:
Do you have any thoughts on this?
For everyone interested in building a system, I want to emphasize to you - this is not the only game that should influence your purchase decision and there are other games. Generally right now the 16 cores seem to do better than the 32 cores for most games, although that may change in the future. If you do have prosumer applications that you are running like I am, those too must factor into your purchase.
Finally, if you have taken the time to read all of this, I wanted to thank you for taking the time to read this and to the developers for all of their efforts in making this a great game.
I'm wondering what the best core count would be for Bannerlord. I'm building a new PC and I'd like to share this because I suspect I'm not the only one with this issue and in the coming years, even if a higher core count CPU is out of your budget today, it will become much cheaper someday and that by the time say, a hypothetical Mount and Blade 3 comes out in 2030, these high core counts will be mainstream. This game is going to be with us for a while and it could be easily a decade before M&B3 comes out. The emergence of Zen in particular in 2017 has propelled 8 cores to be mainstream.
In my case, I'm currently planning to build a system with the recent announced Zen 3 CPU cores by AMD. I'm debating between the 16 core or the 32 core. Bannerlord of course is not the only application I am considering (I do have a few other applications that I am looking at), but I'd be very interested in the engine team developer's opinion's here. Likewise, when you are buying parts for your PC, make sure you consider everything you intend to use that PC for, not just Bannerlord.
The dilemma is very simple. While Bannerlord does use many cores (a huge improvement and big thanks to the engine team for this), not everything is multithreaded (ex: the application is not embarrassingly parallel). The more cores that are added, the slower the average clockspeed becomes.
This is a very long thread and I want to apologize for the length, but I figure, this is a passion of mine, so why not. I just like making threads like this for fun, as right now I have some time on my hands. Let's just say that this is not the only application I'm doing research on.
What information I have found from doing my research
I've extensively done some research and here is what we I have found so far:
It will utilize all of the cores in your system, but that doesn't mean more cores are always the better. There are some workloads that cannot happen in parallel, so single thread performance is also important. So even though we haven't tried yet, I'd be surprised if 3990x would be faster than 3970x, considering its significantly lower base clockspeed(2.8 vs 3.7).
So Moustafa believes that the 64 core is probably slower than the 32 core. I agree with him, and you'll see my calculations later in this post.
Now here is another quote from Murat Türe, who also works on the :
Mount & Blade II: Bannerlord relies even more on CPU usage than Warband. Hundreds of characters, more advanced animations, an Inverse Kinematics system, individual AI, formation AI, combat calculations, (which do not change in respect to distance or visibility,) and many other requirements really increase the burden on the CPU. In order to accommodate this, our optimisation efforts are more heavily focused on the CPU. We generally try to use Data Oriented Design, which enables us to achieve high amounts of parallelism and core usage. Currently, 60-70% of the frame is fully parallel, which means it can, and will, use all of the cores of current and next gen CPUs for the foreseeable future, (the old engine generally used to use 1, or at most 2 cores.) This means that as new, higher core count CPUs begin to emerge, Bannerlord will scale well with the new hardware and players will be able to test bigger and denser battles. Currently our aim for battle sizes on current generation high end gaming CPUs is at 800 characters, at 60FPS.
So this is hugely valuable.
Core scaling is bottlenecked by something called Amdahl's Law. Let's assume for the sake of argument that 75% of the data could be multi-threaded. Keep in mind that the post about 60-70% being multihreaded was from 2017 - I'm assuming 75% today assuming some modest improvements.
A bit more reading: https://www.pugetsystems.com/labs/articles/Estimating-CPU-Performance-using-Amdahls-Law-619/
So we can see that the CPU is already facing diminishing returns after 8 cores. Now here is the formula that we care about.
So in this case:
Assuming 16, 32, and 64 cores @ 75% threading, we get:
Scaling of about:
- 3.37x for the 16 core CPU
- 3.66x for the 32 core CPU
- 3.82x for the 64 core CPU
Now let's look at the existing core count vs clockspeed trade-offs
Let's take a look at 3 reviews of Zen 2:
- The 16 core 3950X:
- The 32 core 3970X:
- The 64 core 3990X:
I've filtered the important parts, but Gamer's Nexus is probably the best reviewer for computer parts on Youtube and not just for gamers.
- The 16 core 3950X seemed to end at around 3.92 GHz on average across 16 cores.
- The 32 core 3970X seemed to end at around 3.83 GHz on average across 32 cores.
- The 64 core 3990X seemed to end at around 3.13 GHz on average across 64 cores.
So it looks like that there is a small average core clockspeed trade-off going from 16 cores to 32 cores, but a bigger loss in single threaded performance going from 32 cores to 64 cores (at least to allow the CPUs to remain within a reasonable TDP). One other consideration is that the Windows scheduler can only deal with 64 threads right now (as of 2020). LInux does not have this limitation.
For further reading: https://www.anandtech.com/show/15483/amd-threadripper-3990x-review/3
Let's take the scaling expected from multiple cores then multiply that by the average of cores, assuming 75% can be made parallel:
Scaling via Amdahl's Law @ 75% | Average of Cores | Total | |
16 cores | 3.368421053 | 3.92 | 13.20421053 |
32 cores | 3.657142857 | 3.83 | 14.00685714 |
64 cores | 3.820895522 | 3.13 | 11.95940299 |
So the 32 core is optimal in this case. Clearly, Moustafa's claim that the 64 core would be slower is accurate due to the lower clockspeeds per core.
The 32 core and 16 core though is close. From a budget standpoint, it makes little sense to go 32 cores, unless of course you have other applications that need the extra cores (I do in my case, as I have some applications that I run outside of playing this game that are closer to being "embarrassingly parallel").
It is possible now that Zen 3 has been announced (and perhaps by the time you read this, future CPUs that are even more efficient are out), that future CPUs will allow for no trade-offs with more cores (Ex: the cores can go to the limits of silicon or whatever replaces silicon) will allow between the jump from 16 to 32 cores with no loss in average performance per CPU core clockspeed.
That being said, we are running into diminishing returns and most people should buy the 16 or even the 8 core. The 32 core is actually slower at many games, if you watch the GN reviews in full.
Without a doubt, AMD has binned the top dies for the 5950X in the silicon lottery (for those who do not know that term is referring to during manufacturing, that some chips are better than others). I would imagine that past the 5950X, an even higher bin is also going into Threadripper and EPYC, which are their higher core count CPUs.
32-Core Overclock: How I Pushed the Threadripper 3970X 1 GHz Over Its Limit
I take the Cinebench world record with the Threadripper 3970X
www.tomshardware.com
Another thing I noticed is that AMD appears to save the best chiplets for the 32- and 64-core parts. This is obvious seeing that all four 3970X CPUs I have tried beat my best 3950x and 3900x in max clock speed on water cooling! There's nothing wrong with tight binning, but for overclockers, finding a chip that you can just overclock to be competitive with the next SKU doesn't seem possible on the current-gen AMD chips. If you want to overclock your processor, pay the extra cash and get the X-series, as the non-X series are the chips that don't hit the higher boost bins.
Considering the jump from 16 to 32 cores had only a small penalty on single threaded average performance, I would imagine that CPUs are efficient enough now that the jump from 8 core to 16 core CPUs would not have much of a penalty in average per core clockspeed across all cores. In fact, for Zen 3, the jump from 16 core to 32 cores might have no penalty, simply because the 32 cores are even higher binned than the 16 cores, and because of technological advancements. AMD for Zen 3 actually claims a 24% performance per watt improvement over Zen 2.
Single threaded performance also matters
So it's a win for the 32 core right? Not necessarily. Not everything in Bannerlord (or any other game). More core also means a slower top core boost.
- The 16 core 3950X can boost up to 4.7 GHz single core: https://www.amd.com/en/products/cpu/amd-ryzen-9-3950x
- The 32 core 3970X can boost up to 4.5 GHz single core: https://www.amd.com/en/products/cpu/amd-ryzen-threadripper-3970x
- The 64 core 3990X can boost up to 4.3 GHz single core: https://www.amd.com/en/products/cpu/amd-ryzen-threadripper-3970x
Max boost for AMD processors is the maximum frequency achievable by a single core on the processor running a bursty single-threaded workload. Max boost will vary based on several factors, including, but not limited to: thermal paste; system cooling; motherboard design and BIOS; the latest AMD chipset driver; and the latest OS updates.
It's likely though that the 16 core can boost higher than the 32 or 64 core for the next generation, although that's not certain right now, especially as Zen 3 is more efficient.
What about the smaller core counts? Well, they actually have a smaller core single core boost speed in Zen 3.
So where single threaded performance is the bottleneck, the 16 core 5950X is the best choice. Like Zen 2 before it, the 64 core Zen 3 Threadripper will be slower in single core boost, although due to advancements and binning, it is possible that the 32 core 5970X might have the same top boost as the 16 core 5950X.
Still, the possibility that the 16 core might have a higher single threaded boost clockspeed is a good reason to buy the 16 core, even if money were no object, at least for Zen 2 circa 2020. At time of this writing, we do not have newer CPUs, so by the time you read this, it may change.
In summary, the 16 core has slightly worse multi-threaded scaling, but may have a better top boost. This may someday change as CPU cores get more efficient, but that is the case circa 2020. I personally think that given these trade-offs, the 16 core is probably the best choice right now, although we still need to see the top clocks for the 32 core Zen 3 boost.
Questions for the developers
@Murat Türe @Mustafa Korkmaz and anyone else on the Taleworlds engine team:
Do you have any thoughts on this?
- Overall, do you think the 16 core is better than the 32 core? We do seem to be running into rapidly diminishing returns here.
- What percentage of the game today (circa 2020) do you think is multi-threaded? The blog post in 2017 said that 60--70% per frame was multi-threaded. Has that improved? Do you think that it will improve by launch day?
- Are different scenes more multi-threaded? Imagine a game with a unit limit of just 500, versus say, a unit limit of the full 2048 (the current engine limit). Would the 2048 unit game be more multi-threaded?
- Are there any other advantages to an HEDT CPU? The one that I see is that HEDT CPUs have quadruple channel RAM, but I don't think that this would translate into better FPS. Is Bannerlord CPU memory bandwidth bottlenecked? At 4k resolutions, it is not uncommon for games to be GPU memory bandwidth restricted (it also depends on the GPU), but less so CPU. I don't see how the other benefits of the HEDT platform would be beneficial (ex: more PCIe lanes).
- After the game is released (Ex: we already have a naval expansion planned), are there plans to make a higher percentage of the game multi-threaded?
For everyone interested in building a system, I want to emphasize to you - this is not the only game that should influence your purchase decision and there are other games. Generally right now the 16 cores seem to do better than the 32 cores for most games, although that may change in the future. If you do have prosumer applications that you are running like I am, those too must factor into your purchase.
Finally, if you have taken the time to read all of this, I wanted to thank you for taking the time to read this and to the developers for all of their efforts in making this a great game.
Last edited: