A technical question about Bannerlord scaling and CPU cores - what is the best core count vs clockspeed? Is it possible to ask the devs directly?

Currently viewing this thread:

CrazyElf

Veteran
This is a really technical post and I'm really not sure where to put this, so moderators, please move this if it's not in the right spot - please accept my apologies in advance.

I'm wondering what the best core count would be for Bannerlord. I'm building a new PC and I'd like to share this because I suspect I'm not the only one with this issue and in the coming years, even if a higher core count CPU is out of your budget today, it will become much cheaper someday and that by the time say, a hypothetical Mount and Blade 3 comes out in 2030, these high core counts will be mainstream. This game is going to be with us for a while and it could be easily a decade before M&B3 comes out. The emergence of Zen in particular in 2017 has propelled 8 cores to be mainstream.

In my case, I'm currently planning to build a system with the recent announced Zen 3 CPU cores by AMD. I'm debating between the 16 core or the 32 core. Bannerlord of course is not the only application I am considering (I do have a few other applications that I am looking at), but I'd be very interested in the engine team developer's opinion's here. Likewise, when you are buying parts for your PC, make sure you consider everything you intend to use that PC for, not just Bannerlord.

The dilemma is very simple. While Bannerlord does use many cores (a huge improvement and big thanks to the engine team for this), not everything is multithreaded (ex: the application is not embarrassingly parallel). The more cores that are added, the slower the average clockspeed becomes.

This is a very long thread and I want to apologize for the length, but I figure, this is a passion of mine, so why not. I just like making threads like this for fun, as right now I have some time on my hands. Let's just say that this is not the only application I'm doing research on.

What information I have found from doing my research

I've extensively done some research and here is what we I have found so far:


It will utilize all of the cores in your system, but that doesn't mean more cores are always the better. There are some workloads that cannot happen in parallel, so single thread performance is also important. So even though we haven't tried yet, I'd be surprised if 3990x would be faster than 3970x, considering its significantly lower base clockspeed(2.8 vs 3.7).

So Moustafa believes that the 64 core is probably slower than the 32 core. I agree with him, and you'll see my calculations later in this post.

Now here is another quote from Murat Türe, who also works on the :


Mount & Blade II: Bannerlord relies even more on CPU usage than Warband. Hundreds of characters, more advanced animations, an Inverse Kinematics system, individual AI, formation AI, combat calculations, (which do not change in respect to distance or visibility,) and many other requirements really increase the burden on the CPU. In order to accommodate this, our optimisation efforts are more heavily focused on the CPU. We generally try to use Data Oriented Design, which enables us to achieve high amounts of parallelism and core usage. Currently, 60-70% of the frame is fully parallel, which means it can, and will, use all of the cores of current and next gen CPUs for the foreseeable future, (the old engine generally used to use 1, or at most 2 cores.) This means that as new, higher core count CPUs begin to emerge, Bannerlord will scale well with the new hardware and players will be able to test bigger and denser battles. Currently our aim for battle sizes on current generation high end gaming CPUs is at 800 characters, at 60FPS.

So this is hugely valuable.

Core scaling is bottlenecked by something called Amdahl's Law. Let's assume for the sake of argument that 75% of the data could be multi-threaded. Keep in mind that the post about 60-70% being multihreaded was from 2017 - I'm assuming 75% today assuming some modest improvements.

2560px-AmdahlsLaw.svg.png


A bit more reading: https://www.pugetsystems.com/labs/articles/Estimating-CPU-Performance-using-Amdahls-Law-619/

So we can see that the CPU is already facing diminishing returns after 8 cores. Now here is the formula that we care about.

gif.latex


So in this case:

Assuming 16, 32, and 64 cores @ 75% threading, we get:

Scaling of about:

  • 3.37x for the 16 core CPU
  • 3.66x for the 32 core CPU
  • 3.82x for the 64 core CPU
If we assume that it is 60-70% as in the post above, then we will get even lower returns. Anyways, I can recalculate later if need be.

Now let's look at the existing core count vs clockspeed trade-offs

Let's take a look at 3 reviews of Zen 2:

  • The 16 core 3950X:
  • The 32 core 3970X:
  • The 64 core 3990X:
I've filtered the important parts, but Gamer's Nexus is probably the best reviewer for computer parts on Youtube and not just for gamers.
  • The 16 core 3950X seemed to end at around 3.92 GHz on average across 16 cores.
  • The 32 core 3970X seemed to end at around 3.83 GHz on average across 32 cores.
  • The 64 core 3990X seemed to end at around 3.13 GHz on average across 64 cores.
If you bother to watch the full GN reviews (and I advise you do), then you'll notice that the Ryzen CPU scheduler is actually really efficient at allocating tasks to the right core and CCD.

So it looks like that there is a small average core clockspeed trade-off going from 16 cores to 32 cores, but a bigger loss in single threaded performance going from 32 cores to 64 cores (at least to allow the CPUs to remain within a reasonable TDP). One other consideration is that the Windows scheduler can only deal with 64 threads right now (as of 2020). LInux does not have this limitation.

For further reading: https://www.anandtech.com/show/15483/amd-threadripper-3990x-review/3

Let's take the scaling expected from multiple cores then multiply that by the average of cores, assuming 75% can be made parallel:

Scaling via Amdahl's Law @ 75%Average of CoresTotal
16 cores
3.368421053​
3.92​
13.20421053​
32 cores
3.657142857​
3.83​
14.00685714​
64 cores
3.820895522​
3.13​
11.95940299​

So the 32 core is optimal in this case. Clearly, Moustafa's claim that the 64 core would be slower is accurate due to the lower clockspeeds per core.

The 32 core and 16 core though is close. From a budget standpoint, it makes little sense to go 32 cores, unless of course you have other applications that need the extra cores (I do in my case, as I have some applications that I run outside of playing this game that are closer to being "embarrassingly parallel").

It is possible now that Zen 3 has been announced (and perhaps by the time you read this, future CPUs that are even more efficient are out), that future CPUs will allow for no trade-offs with more cores (Ex: the cores can go to the limits of silicon or whatever replaces silicon) will allow between the jump from 16 to 32 cores with no loss in average performance per CPU core clockspeed.

That being said, we are running into diminishing returns and most people should buy the 16 or even the 8 core. The 32 core is actually slower at many games, if you watch the GN reviews in full.

Without a doubt, AMD has binned the top dies for the 5950X in the silicon lottery (for those who do not know that term is referring to during manufacturing, that some chips are better than others). I would imagine that past the 5950X, an even higher bin is also going into Threadripper and EPYC, which are their higher core count CPUs.


Another thing I noticed is that AMD appears to save the best chiplets for the 32- and 64-core parts. This is obvious seeing that all four 3970X CPUs I have tried beat my best 3950x and 3900x in max clock speed on water cooling! There's nothing wrong with tight binning, but for overclockers, finding a chip that you can just overclock to be competitive with the next SKU doesn't seem possible on the current-gen AMD chips. If you want to overclock your processor, pay the extra cash and get the X-series, as the non-X series are the chips that don't hit the higher boost bins.

Considering the jump from 16 to 32 cores had only a small penalty on single threaded average performance, I would imagine that CPUs are efficient enough now that the jump from 8 core to 16 core CPUs would not have much of a penalty in average per core clockspeed across all cores. In fact, for Zen 3, the jump from 16 core to 32 cores might have no penalty, simply because the 32 cores are even higher binned than the 16 cores, and because of technological advancements. AMD for Zen 3 actually claims a 24% performance per watt improvement over Zen 2.

Single threaded performance also matters
So it's a win for the 32 core right? Not necessarily. Not everything in Bannerlord (or any other game). More core also means a slower top core boost.
Here is AMD's quote:

Max boost for AMD processors is the maximum frequency achievable by a single core on the processor running a bursty single-threaded workload. Max boost will vary based on several factors, including, but not limited to: thermal paste; system cooling; motherboard design and BIOS; the latest AMD chipset driver; and the latest OS updates.

It's likely though that the 16 core can boost higher than the 32 or 64 core for the next generation, although that's not certain right now, especially as Zen 3 is more efficient.

What about the smaller core counts? Well, they actually have a smaller core single core boost speed in Zen 3.
AMD-Ryzen-5000-Desktop-CPUs_Zen-3-Vermeer_14.png

AMD-Ryzen-5000-Desktop-CPUs_Zen-3-Vermeer_18-1480x833.png



So where single threaded performance is the bottleneck, the 16 core 5950X is the best choice. Like Zen 2 before it, the 64 core Zen 3 Threadripper will be slower in single core boost, although due to advancements and binning, it is possible that the 32 core 5970X might have the same top boost as the 16 core 5950X.

Still, the possibility that the 16 core might have a higher single threaded boost clockspeed is a good reason to buy the 16 core, even if money were no object, at least for Zen 2 circa 2020. At time of this writing, we do not have newer CPUs, so by the time you read this, it may change.

In summary, the 16 core has slightly worse multi-threaded scaling, but may have a better top boost. This may someday change as CPU cores get more efficient, but that is the case circa 2020. I personally think that given these trade-offs, the 16 core is probably the best choice right now, although we still need to see the top clocks for the 32 core Zen 3 boost.

Questions for the developers

@Murat Türe @Mustafa Korkmaz and anyone else on the Taleworlds engine team:

Do you have any thoughts on this?

  1. Overall, do you think the 16 core is better than the 32 core? We do seem to be running into rapidly diminishing returns here.
  2. What percentage of the game today (circa 2020) do you think is multi-threaded? The blog post in 2017 said that 60--70% per frame was multi-threaded. Has that improved? Do you think that it will improve by launch day?
  3. Are different scenes more multi-threaded? Imagine a game with a unit limit of just 500, versus say, a unit limit of the full 2048 (the current engine limit). Would the 2048 unit game be more multi-threaded?
  4. Are there any other advantages to an HEDT CPU? The one that I see is that HEDT CPUs have quadruple channel RAM, but I don't think that this would translate into better FPS. Is Bannerlord CPU memory bandwidth bottlenecked? At 4k resolutions, it is not uncommon for games to be GPU memory bandwidth restricted (it also depends on the GPU), but less so CPU. I don't see how the other benefits of the HEDT platform would be beneficial (ex: more PCIe lanes).
  5. After the game is released (Ex: we already have a naval expansion planned), are there plans to make a higher percentage of the game multi-threaded?

For everyone interested in building a system, I want to emphasize to you - this is not the only game that should influence your purchase decision and there are other games. Generally right now the 16 cores seem to do better than the 32 cores for most games, although that may change in the future. If you do have prosumer applications that you are running like I am, those too must factor into your purchase.



Finally, if you have taken the time to read all of this, I wanted to thank you for taking the time to read this and to the developers for all of their efforts in making this a great game.
 
Last edited:

Maneuverer

Regular
Wow, you certainly did your homework. I am also interested in how bannerlord performance is scaled. Soon, I will be building my new pc and I can't decide whether I should take Ryzen 7 3700x over Ryzen 5 3600x (core count vs max clock).

General rule is spend less money on CPU and buy better GPU but I am not sure if that is the case with bannerlord since there is tons of ai calculations during the sieges/battles. Would also like to hear from devs what is the optimal build. Should I stick to the general rule or pump more money in my CPU?
 

CrazyElf

Veteran
Wow, you certainly did your homework. I am also interested in how bannerlord performance is scaled. Soon, I will be building my new pc and I can't decide whether I should take Ryzen 7 3700x over Ryzen 5 3600x (core count vs max clock).

General rule is spend less money on CPU and buy better GPU but I am not sure if that is the case with bannerlord since there is tons of ai calculations during the sieges/battles. Would also like to hear from devs what is the optimal build. Should I stick to the general rule or pump more money in my CPU?

From what I can tell, Bannerlord is a more CPU dependent game.

This general rule is a bit more complex. For example, for 1080p 240 fps (or higher) gaming, often the bottleenck is the CPU. Some games at higher player counts in multiplayer (such as the Battlefield series) can also be sometimes CPU bottlenecked. In most games, and especially at higher resolutions, the GPU is the bottleneck.
 

Maneuverer

Regular
From what I can tell, Bannerlord is a more CPU dependent game.

This general rule is a bit more complex. For example, for 1080p 240 fps (or higher) gaming, often the bottleenck is the CPU. Some games at higher player counts in multiplayer (such as the Battlefield series) can also be sometimes CPU bottlenecked. In most games, and especially at higher resolutions, the GPU is the bottleneck.

I've would have thought the same. However, like you mentioned some stuff just do not rely on multithreaded performance plus considering the age of the game engine you might not achieve the desired result with 16, 32... core CPUs. I don't know how relevant it is now but even a couple years ago I remember people complaining about modern games that still sucked in multithreaded performance. Bad optimization is still a thing now.

Hope we get some reply from the devs.
 

NLCRich

Squire
Very interesting, highly technical post @CrazyElf. I'd be curious to see how exactly bannerlord tests out in the same scenarios, what sort of CPU utilization it has.

Edit: I removed a part that didn't make so much sense because you already said it.
 
Last edited:

Gambles

Sergeant at Arms
M&BWBWF&SNWVC
Unfortunately it seems although CPU's have been going Core count > Core speed. Games have yet to keep up, it seems as though over 8 cores and 8 threads performance in most games gets severely diminishing returns.

You could really ask the whole gaming industry the same thing, when the heck are you guys going to utilize all these damn cores? haha, I'm pretty sure we've been hounding dev's about that since like the quad core phenoms.
 

Maneuverer

Regular
Official specs cut from Steam game page:

Recommended:
Requires a 64-bit processor and operating system
OS: Windows 10 (64-bit only)
Processor: Intel® Core™ i5-9600K / AMD Ryzen™ 5 3600X
Memory: 8 GB RAM
Graphics: NVIDIA® GeForce® GTX 1060 3GB / AMD Radeon™ RX 580
Storage: 60 GB available space
Additional Notes: These estimates may change during final release.

I have pretty similiar setup. Ryzen 5 1600 and GeForce 1060 6gb. The game runs fine in full settings BUT... The body count is relatively low-medium and number of troops in battles is around medium size. There is no quality meassure for these two I just personaly think if I could leave the body count to unlimited and increase the army sizes the battles would feel 100% immersive.

This might be something that cpu count might improve upon but I don't know. When I built my PC 3 yeasrs ago I had in mind I want to be able to run Bannerlord if it ever comes out (and it did!), right now I want to squeeze more out of the game. Like @CrazyElf said the general rule is a bit more complex but it does work in my case since Im a casual player se the question remains. Buying better CPU or GPU.
 
  1. Overall, do you think the 16 core is better than the 32 core? We do seem to be running into rapidly diminishing returns here.
  2. What percentage of the game today (circa 2020) do you think is multi-threaded? The blog post in 2017 said that 60--70% per frame was multi-threaded. Has that improved? Do you think that it will improve by launch day?
  3. Are different scenes more multi-threaded? Imagine a game with a unit limit of just 500, versus say, a unit limit of the full 2048 (the current engine limit). Would the 2048 unit game be more multi-threaded?
  4. Are there any other advantages to an HEDT CPU? The one that I see is that HEDT CPUs have quadruple channel RAM, but I don't think that this would translate into better FPS. Is Bannerlord CPU memory bandwidth bottlenecked? At 4k resolutions, it is not uncommon for games to be GPU memory bandwidth restricted (it also depends on the GPU), but less so CPU. I don't see how the other benefits of the HEDT platform would be beneficial (ex: more PCIe lanes).
  5. After the game is released (Ex: we already have a naval expansion planned), are there plans to make a higher percentage of the game multi-threaded?

1-Diminishing returns is a good term for explaining our situation. Besides the theoretical problems, on every game including Bannerlord there are some workloads that can be parallelized, and some that can't be. So consider a frame in a combat situation like: (2x ST workload) + (16x MT workload) + (1x ST workload) + (8X MT workload) + (3X ST workload). If you have a 4 core cpu it would be take 12x time (2 + 4 + 1 + 2 + 3). For a 8 core cpu with the same architecture it would take 9x time and for 16 core it would be 7.5x time. After 16 cores, core count doesn't affect too much, like for example a 32 core cpu would take 6.75x time. So a cpu with a faster single threaded performance would be a better solution after a certain number of cores. I wouldn't buy a CPU with less than 8 cores though, considering next-gen consoles have 8 cores.

2-We're always trying to improve how much of a frame that can be multithreaded and will work on this until full release.

3-I'm not sure about this. The parts of a frame that are better threaded are indeed related to units(like AI or physics) but they also have single threaded parts. Though, for example campaign parts are mostly single threaded.

4-HEDT CPUs tend to have low single threaded performance so I don't think they're very good for games besides some exceptions. I don't think we have cpu memory bottleneck at all.

5-Yes!

If I'm not mistaken, fastest consumer CPU for Bannerlord right now is 10900K. This may change after 5900X is released, so we'll see :smile:
 

NLCRich

Squire
1-Diminishing returns is a good term for explaining our situation. Besides the theoretical problems, on every game including Bannerlord there are some workloads that can be parallelized, and some that can't be. So consider a frame in a combat situation like: (2x ST workload) + (16x MT workload) + (1x ST workload) + (8X MT workload) + (3X ST workload). If you have a 4 core cpu it would be take 12x time (2 + 4 + 1 + 2 + 3). For a 8 core cpu with the same architecture it would take 9x time and for 16 core it would be 7.5x time. After 16 cores, core count doesn't affect too much, like for example a 32 core cpu would take 6.75x time. So a cpu with a faster single threaded performance would be a better solution after a certain number of cores. I wouldn't buy a CPU with less than 8 cores though, considering next-gen consoles have 8 cores.

2-We're always trying to improve how much of a frame that can be multithreaded and will work on this until full release.

3-I'm not sure about this. The parts of a frame that are better threaded are indeed related to units(like AI or physics) but they also have single threaded parts. Though, for example campaign parts are mostly single threaded.

4-HEDT CPUs tend to have low single threaded performance so I don't think they're very good for games besides some exceptions. I don't think we have cpu memory bottleneck at all.

5-Yes!

If I'm not mistaken, fastest consumer CPU for Bannerlord right now is 10900K. This may change after 5900X is released, so we'll see :smile:
Thanks for responding to OP. Very interesting!
 

Blood Gryphon

Master Knight
WBVC
1-Diminishing returns is a good term for explaining our situation. Besides the theoretical problems, on every game including Bannerlord there are some workloads that can be parallelized, and some that can't be. So consider a frame in a combat situation like: (2x ST workload) + (16x MT workload) + (1x ST workload) + (8X MT workload) + (3X ST workload). If you have a 4 core cpu it would be take 12x time (2 + 4 + 1 + 2 + 3). For a 8 core cpu with the same architecture it would take 9x time and for 16 core it would be 7.5x time. After 16 cores, core count doesn't affect too much, like for example a 32 core cpu would take 6.75x time. So a cpu with a faster single threaded performance would be a better solution after a certain number of cores. I wouldn't buy a CPU with less than 8 cores though, considering next-gen consoles have 8 cores.

2-We're always trying to improve how much of a frame that can be multithreaded and will work on this until full release.

3-I'm not sure about this. The parts of a frame that are better threaded are indeed related to units(like AI or physics) but they also have single threaded parts. Though, for example campaign parts are mostly single threaded.

4-HEDT CPUs tend to have low single threaded performance so I don't think they're very good for games besides some exceptions. I don't think we have cpu memory bottleneck at all.

5-Yes!

If I'm not mistaken, fastest consumer CPU for Bannerlord right now is 10900K. This may change after 5900X is released, so we'll see :smile:
Super useful information thank you so much!

Also great question/post @CrazyElf
 

CrazyElf

Veteran
1-Diminishing returns is a good term for explaining our situation. Besides the theoretical problems, on every game including Bannerlord there are some workloads that can be parallelized, and some that can't be. So consider a frame in a combat situation like: (2x ST workload) + (16x MT workload) + (1x ST workload) + (8X MT workload) + (3X ST workload). If you have a 4 core cpu it would be take 12x time (2 + 4 + 1 + 2 + 3). For a 8 core cpu with the same architecture it would take 9x time and for 16 core it would be 7.5x time. After 16 cores, core count doesn't affect too much, like for example a 32 core cpu would take 6.75x time. So a cpu with a faster single threaded performance would be a better solution after a certain number of cores. I wouldn't buy a CPU with less than 8 cores though, considering next-gen consoles have 8 cores.

2-We're always trying to improve how much of a frame that can be multithreaded and will work on this until full release.

3-I'm not sure about this. The parts of a frame that are better threaded are indeed related to units(like AI or physics) but they also have single threaded parts. Though, for example campaign parts are mostly single threaded.

4-HEDT CPUs tend to have low single threaded performance so I don't think they're very good for games besides some exceptions. I don't think we have cpu memory bottleneck at all.

5-Yes!

If I'm not mistaken, fastest consumer CPU for Bannerlord right now is 10900K. This may change after 5900X is released, so we'll see :smile:


Thanks for the replies - we as a community really appreciate it!

1. Yep - not everything can be made into parallel and then there is the matter of scheduling it efficiently so that it can make the most of being parallel.

2. I agree - this is a challenging task and will be ongoing for many years.

3. Interesting ... this may lead to the possibility of larger unit counts advantage of higher core counts partially. Maybe for maps with 2048 units, it might be able to use the most out of all cores.

4. Agree on the RAM not being a bottleneck. The good news is the leap from 16 to 32 cores has very little penalty in average core performance.
  • The 16 core 3950X seemed to end at around 3.92 GHz on average across 16 cores.
  • The 32 core 3970X seemed to end at around 3.83 GHz on average across 32 cores.
  • The 64 core 3990X seemed to end at around 3.13 GHz on average across 64 cores.

The more I think about it, the 32 core might catch up as cores become more efficient and maybe even a slight advantage if AMD bins aggressively.

5. Great!
 
Top Bottom