[Disc] Mod optimization

Users who are viewing this thread

There's a multitude of ways mods can be optimized, based on what area is causing a slowdown issue to begin with. Personally, lately I have improved performance by scaling down specular and normal map sizes (making them 512x512 if their main texture is 1024x1024, for example). This saves on mod size, and helps prevent issues when there's a lot of textures in your mod (not really an issue for most mods though).
Using proper shaders for your materials, creating good LODs, not overusing flora props on a map. All these things can help improve a mod's stability outside of scripting.
 
What are your performance issues? Is there stuttering or lag or just a low framerate, and where?

Knowing where you have issues is the first thing you should do in diagnosing poor performance. A solution for one framerate problem might cause extra issues if you implement it somewhere it's not needed, etc.
 
Maroon said:
There's a multitude of ways mods can be optimized, based on what area is causing a slowdown issue to begin with. Personally, lately I have improved performance by scaling down specular and normal map sizes (making them 512x512 if their main texture is 1024x1024, for example). This saves on mod size, and helps prevent issues when there's a lot of textures in your mod (not really an issue for most mods though).
Using proper shaders for your materials, creating good LODs, not overusing flora props on a map. All these things can help improve a mod's stability outside of scripting.
Thanks i will take a look at that.


Kentucky James said:
What are your performance issues? Is there stuttering or lag or just a low framerate, and where?

Knowing where you have issues is the first thing you should do in diagnosing poor performance. A solution for one framerate problem might cause extra issues if you implement it somewhere it's not needed, etc.
I'm worried that the size of 1257's map and the amount of parties will slow things down on the user's end. For now i slowed down a few simple triggers that weren't all that important and ran on less than 6 hour period.
 
Slowing down the triggers won't really do much since each trigger is calculated on a single frame, and this causes stuttering on that frame. Slowing down the triggers will only make the stuttering happen slightly less often. What you can do however is stagger them so that all your 6 hour triggers don't fire at once. So have one trigger fire every 6.1 hours, another every 6.21 and so on so that they don't fire all together.
 
Apparently, that is not true:

Leonion said:
gsanders said:
2)  Consider the following triggers:  one hourly, one at 2 hours, one at 4 hours, one at 8 hours, one at 24 hours, one weekly.
the next week, at the same time I'm processing the weekly trigger, also firing at that exact instant is the 24 hour, the 8 hour, the 4 hour, the two hour, and every single 1 hour trigger -- ALL AT ONCE.

    The artful solution is to move as many triggers to prime numbers to slightly different timing so they don't fire at the same time, spreading the hit on performance out across time. 

NO.
This is a bad idea.

I've seen your optimization in Perisno and I thought 'wow, that's a cool solution', and followed like a sheep doing this to more triggers, but months later I tested it and we both underestimated the intelligence of TW devs.

Counters of all triggers of the same frequency start at different times, they're given different offsets at the start of the game.
I.e., you have 10 24-hour triggers, but they will trigger at different time, not at once.

Whole numbers (like 24, 6, 4) are actually almost a guarantee that triggers will never be triggered at the same time, because they started running at different time and there is no change in their "pace".
Fractional triggers, however, have different "pace", so sooner or later they will intersect with each other, having "compensated" that offset that was given to them at the start of the game by the engine.

You can easily test it by creating multiple triggers of the same frequency or frequency that is 'part' of the largest one (sorry, my math English vocabulary is non-existent), like 2, 4, 6, 8, 24 and adding display_message operations like "Trigger 1/2/3/4 was called!" to each.
You won't receive these messages at the same time.
______________________________________________________

So this is not a way to optimize triggers.
The best ways are indeed getting rid of big try_for_parties triggers and diving large triggers into parts, like that enormous faction AI trigger that does a lot of calculations for all factions at once:
Code:
   (0,
   [
     (eq, "$g_recalculate_ais", 1),
     (assign, "$g_recalculate_ais", 0),
     (call_script, "script_recalculate_ais"),
   ]),

Code:
   (0.1, 
   [
	 (try_begin),
		(is_between, "$g_recalculate_ais_cur_fac", kingdoms_begin, kingdoms_end),
		(try_begin),
			(faction_slot_eq, "$g_recalculate_ais_cur_fac", slot_faction_state, sfs_active),
			(call_script, "script_recalculate_ais_for_faction", "$g_recalculate_ais_cur_fac"),
		(else_try),
			(assign, ":cycl_end", kingdoms_end),
			(assign, ":cycl_beg", "$g_recalculate_ais_cur_fac"),
			(try_for_range, ":faction_no", ":cycl_beg", ":cycl_end"),
				(faction_slot_eq, ":faction_no", slot_faction_state, sfs_active),
				(assign, "$g_recalculate_ais_cur_fac", ":faction_no"),
				(call_script, "script_recalculate_ais_for_faction", "$g_recalculate_ais_cur_fac"),
				(assign, ":cycl_end", 0),
			(try_end),
		(try_end),
	 (else_try),
		(assign, "$g_recalculate_ais_cur_fac", kingdoms_begin),
		(call_script, "script_recalculate_ais_for_faction", "$g_recalculate_ais_cur_fac"),
	 (try_end),
	 (val_add, "$g_recalculate_ais_cur_fac", 1),
   ]),
 
Simply put:

1.  On the Strategic Map, mods can be optimized by reducing the number of nested operations running on a given frame.  try_for_parties, etc., are to be used sparingly and carefully.

You can also optimize your map itself, so that pathfinding is pretty cheap.  Look up "navmesh pathfinding" and maybe read up on how A* works to understand fully, but basically... you want as few triangles in your Strategic Map as you can get, without sacrificing needed detail.  It's a true pity that the Navmesh and the rendered mesh are the same thing (without extreme hackery, anyhow).

2.  During Scenes (i.e., battles, towns, etc.), it's largely about asset optimization. 

No, really! 

LODs, guys.  Don't make content without carefully considering LODs and overall load; every single mesh attached to live Actors costs quite a lot of CPU. 

Also, the Horseman AI is a huge CPU hog, for whatever reasons; be aware that an Agent riding a Horse is somewhere on the order of 4-5 times as expensive, per frame, as infantry.  Why, I really don't know and don't understand, honestly; I feel like there's something really borked there but it never got looked at and it appears Bannerlord has the same issue (since they just ported over the AIs).

Lastly, the Navmesh can make or break your Scene, performance-wise.  Don't be a lazy person and just use the automated generator, k?  It's... not really optimal.  I really wish I could re-write it.

Also, writing well-designed Triggers and Events is a big deal, if you're going to write Fancy Code that runs during battles. 

The engine's callbacks are very useful places to hook up complex code, but great caution must be used when writing anything that operates within a one-frame context.  Don't write that Ultra-Kewl Presentation unless it's really, really actually necessary to gameplay.
 
What i found out so far:

1. Copying Viking Conquest simple trigger structure and instead of using randomization adding incremental loops. Example:

before:
Code:
  # Setting random walker types
  (36,
   [(try_for_range, ":center_no", centers_begin, centers_end),
      (this_or_next|party_slot_eq, ":center_no", slot_party_type, spt_town),
      (             party_slot_eq, ":center_no", slot_party_type, spt_village),
      (call_script, "script_center_remove_walker_type_from_walkers", ":center_no", walkert_needs_money),
      (call_script, "script_center_remove_walker_type_from_walkers", ":center_no", walkert_needs_money_helped),
      (store_random_in_range, ":rand", 0, 100),
      (try_begin),
        (lt, ":rand", 70),
        (neg|party_slot_ge, ":center_no", slot_town_prosperity, 60),
        (call_script, "script_cf_center_get_free_walker", ":center_no"),
        (call_script, "script_center_set_walker_to_type", ":center_no", reg0, walkert_needs_money),
      (try_end),
    (try_end),
    ]),

after:
Code:
  
  # Setting random walker types
  (0.049,  ################## once every 36 hours for each of the 733 centers
  [
   (try_begin),
     (lt, "$g_set_random_walker_types_cur_center", centers_begin), 
     (assign, "$g_set_random_walker_types_cur_center", centers_begin),
   (try_end),

   (try_begin),
     (ge, "$g_set_random_walker_types_cur_center", centers_end), 
     (assign, "$g_set_random_walker_types_cur_center", centers_begin),
   (try_end),
  
   (try_begin),
      (this_or_next|party_slot_eq, "$g_set_random_walker_types_cur_center", slot_party_type, spt_town),
      (party_slot_eq, "$g_set_random_walker_types_cur_center", slot_party_type, spt_village),
      (call_script, "script_center_remove_walker_type_from_walkers", "$g_set_random_walker_types_cur_center", walkert_needs_money),
      (call_script, "script_center_remove_walker_type_from_walkers", "$g_set_random_walker_types_cur_center", walkert_needs_money_helped),
      (store_random_in_range, ":rand", 0, 100),
      (try_begin),
        (lt, ":rand", 70),
        (neg|party_slot_ge, "$g_set_random_walker_types_cur_center", slot_town_prosperity, 60),
        (call_script, "script_cf_center_get_free_walker", "$g_set_random_walker_types_cur_center"),
        (call_script, "script_center_set_walker_to_type", "$g_set_random_walker_types_cur_center", reg0, walkert_needs_money),
      (try_end),
    (try_end),
	
	######## proceeds to the next center when it's triggered again
	(val_add, "$g_set_random_walker_types_cur_center", 1),
    ]),

2. Fixing the "script_create_kingdom_party_if_below_limit" problem (if you're using that):
1. Create new faction slots (one for each party type you will use to keep track of the amount of those parties without needing to call for "script_count_parties_of_faction_and_party_type")
2. comment out the line in "create_kingdom_party_if_below_limit" that calls for "script_count_parties_of_faction_and_party_type"
3. Add code below that gets the current party type count slot of the current faction and apply it to :party_count
4. In "cf_create_kingdom_party" every time a party of that type is spawned add 1 to the associated slot of the current faction
5. In "game_event_simulate_battle" check everytime a party is defeated for their template. If the template is one of your party types, get that party faction and remove 1 from the current value of its associated party type slot.

3. There's also this, but i didn't find a way to fix it yet.

 
xenoargh said:
Also, the Horseman AI is a huge CPU hog, for whatever reasons; be aware that an Agent riding a Horse is somewhere on the order of 4-5 times as expensive, per frame, as infantry.

Horses do a few things than infantry doesn't, like a kind of Inverse Kinematic for riding up hills and a more complex acceleration model. The horses also seem to avoid crashing into trees even when there's no navmesh.
 
xenoargh said:
LODs, guys.  Don't make content without carefully considering LODs and overall load; every single mesh attached to live Actors costs quite a lot of CPU. 
Isn't it the GPU that processes *skin* shaders?
https://forums.taleworlds.com/index.php/topic,6575.msg8889230.html#msg8889230

Also, in battles asset optimization is no more important than script optimization.
A good example for me is Perisno/PNB vs PoP.
We in Perisno have **** LODs and meshes, they in PoP polished them to the insane extent.
However, they have poorly optimized scripts, while in PNB I spent some time working on them.
The result? I can play PNB without stutters at greater battlesize than PoP, despite much worse asset optimization.
But my PC is good.
People with toasters, however, find PoP asset optimization more important because they can't play Perisno due to their PC's inability to handle poorly optimized assets. And scripts come second for them.

Kentucky James said:
So is this automatic offset only in simple_triggers? Do triggers offset to any point within the range of the trigger, or is it all staggered to within an hour or something? Is that the same for the battle triggers?
No, the offset is applied to module_triggers too.
Actually, the first time when I've heard about this offset, it was when somebody asked on the forge how do they fix an inaccurate event trigger (i.e. they set it to first trigger by, say, day 130 (by the mean of frequency 130*24), but instead it got triggered at other days).
As for "Do triggers offset to any point within the range of the trigger," - I think so, since sometimes I had a weekly budget trigger called like 4-5 hours after the start of the game.
As for battle triggers - yep, they have offsets too. 
 
That's weird as hell, and very useful to know.

Regarding lods, having more than 2 (lod2, lod4) probably isn't necessary and not worth the extra memory. The lod4 is by far the most important since it'll be rendered at max distance which isn't all that far. Polygon count is rarely the framerate bottleneck in warband.
 
Speaking of lod distances, maybe someone will find this visual representation useful:
h-368.jpg
The red distance numbers are approximate.
 
Yeah, LOD's 2-4 are critical, but happily are far-enough away that we can get away with big reductions.

As for polycount impact on performance... let's just say that, for Vanilla battle-sizes, it's not that important.  When you scale up... it really starts mattering.  Well, that, and you start hitting vertex buffer limits, heh.

Blood and Steel is a mod where, even in the old versions, we were at twice Vanilla's battlesize, or more.  I'm currently doing tests with battlesizes between 200-500; I think 500 is about the utter limit with the Warband engine, and it's choppy when there are enough horsemen in the scene.  I'm aiming for smooth gameplay for non-potato machines around 450 battlesize.
 
Something I've not tried to find out yet but that I'm curious about: if a mesh only has a LOD3, when is that used? Does it only use it if the mesh is between 4400 and 9000 units away, or will it use the LOD at any distance greater than 4400 units? (thanks Leonion for the graphics for that!)
So basically, if a LOD4 is absent, will a LOD3 be used in its place?
 
xenoargh said:
great caution must be used when writing anything that operates within a one-frame context.  Don't write that Ultra-Kewl Presentation unless it's really, really actually necessary to gameplay.
Can you elaborate on this? What kind of problems could this cause?
 
Simplest answer is that if it's heavily recursive, requires any Fancy Math (vector math, distances, etc.) and it runs every frame, you can drag framerates down super-fast. 

The biggest issue I've seen is people write stuff that is try_for_agents that includes another try_for_agents interior loop and then add on vector math operations, etc.  These are potentially very expensive, since performance is n^2, so they need to be tested with large loads to ensure good performance in real-world gameplay. 

In general, best practices for that stuff is to process the Agents in batches distributed over a few frames so that the load's reasonable, stopping the process if the Agent's current counter for <thing> isn't at <value>.  There are very few <things> where a couple of frames matters all that much.  For example, checking Agents to see whether they're in range for <special thing> or are subject to <special effect> can often be distributed in this way.

I'm doing some of these tricks in the next version of Blood and Steel to reduce load for a few things that used to check all of the Agents on a specific frame.  Agent registers are hugely powerful for reducing these loads by allowing distribution of operations over time.

A classic example is an explosion; you want it to damage nearby Agents, but checking every Agent's distance, raycasting collisions, etc., is really expensive.  So you need to do the calculation that cannot be avoided first.

So the right answer, generally, is to space that over some frames; have a register for the explosion and another indicating it still hasn't gotten checked yet.  Then you're checking for active explosions, checking all Agents that haven't been checked yet, etc.
 
xenoargh said:
Simplest answer is that if it's heavily recursive, requires any Fancy Math (vector math, distances, etc.) and it runs every frame, you can drag framerates down super-fast. 

Obviously, but that's not specific to warband. And native contains a couple of twice-nested try_for_agent loops which are on their own responsible for the majority of frame issues in ultra-large battles. But even with those, the game runs without a hitch on the largest intended battle settings. From my tests it's the hardcoded combat AI and other overhead like collision which are the main bottleneck, not so much the exposed module system.
For the explosion example you posted, even in very large battles you can loop for all the agents, check the distances and apply damage and the hitches won't be too noticable. By eliminating most of the agents early in the loop body with agent_is_human or by specifying a team, you can avoid applying all that logic to every agent in the game and have the code scale better for large battles.
 
Back
Top Bottom