Armagan: Stall them, Callum, buy us time.
Callum: But I can't do this for long, they are all over me.
Armagan: Arrange a meeting every week if you have to, just stall them. We need until 2024 to complete our master plan.
Callum: I want a raise? Now please?
Armagan: You'll get a raise and a bonus for every abandoned major mod.
Callum: We are not evil, are we boss?
Armagan: Modders are terrible people, you know that.
I harshly criticise TW development methodology, and this thread kind of proved my point for me. That said, I find your criticism unfair. TW is putting a lot of effort into their product; it is just that they do not follow even the simplest of Software Engineering principles. I have seen some code that is simply outrageous. Magic values, code duplication, etc... Why an engineer writes such code and others allow it is beyond me. But this epidemic of malpractice is quite widespread in the world of Software, especially across gaming studios.
Another thing to keep in mind is that games aren't your average tax and accounting software; there isn't a well defined template of the outputs and inputs.
You can test basic things for correctness, like the math library, maybe you can have a way of integrating buildbots, to see if the change breaks other platforms. Launching the game to see if they crash older savegames. Or having special versions of the game that try to do some kind of limited/automated playtest to check dead ends or logic problems in a quest line. Maybe lint for odd stats issues.
But good luck trying to unit test a combat-based 3D sandbox RPG. There are so many possibilities that it's impossible to test them all. Developers do functional live testing and tweaking of their features until they experimentally feel good and fun, then the human QA team starts exploiting the heaps of unintended issues and they slowly get ironed out until people run out of time or it's good enough for most purposes.
But modern games are a marvel of engineering, and they need to compute, process, redraw and finish everything under 16 milliseconds without hitching.
Add to that the fact that there's a big creative part of self-discovery and seeing what's fun and works or not, so requirements (and your plan) change daily, and you have your answer.
--
Note: I'm not excusing poor-quality software. A game internally may look a bit wonky or ugly because it grows organically, but it may work as intended.
Warband's module system wasn't super pretty either, but the whole thing was comfortable to use, easy to expand and because it mixed tables of data with triggers you can add new items relatively easily without touching anything else.
At the end of the day, as long as you don't have super unreadable spaghetti code you (and modders) should be fine.
Going with decompiled C# and limiting visibility of comments and a good separation of files was a step backwards, in my opinion. Something like LuaJIT would have been a better replacement, I think. At least in the original game we had access to the whole (modular) thing, as limited and creative-workaround-prone as it was in some ways. We could delete/stub most of the logic and turn it into something else.
Sorry, but it sounds like you do not have much experience in Software Validation.
First of all, one has to create testable code and if your design or implementation does not allow testing for any reason, you need to revisit your design or implementation accordingly. This is analogous to the testability requirement of any scientific argument. If it cannot be tested, how do we know it is correct, right?
Unit testing involves taking units of programming (whatever it may refer to in your paradigm) and testing them
in isolation. So unless "a combat based 3d sandbox RPG" is written as a single unit (cannot be done anyway,) then unit testing this software is no different than unit testing any other software. The statement "there are so many possibilities that it's impossible to test them all" also has no place in unit testing (or any other form of white-box test) unless, again, there is a magical unit that does everything.
When testing "a combat based 3d sandbox RPG," the tricky part would be your integration tests where you test the interplay of your units; and system tests (end-to-end, performance, stress, etc.) which should still take "combat," "sandbox" and "RPG" modules (or packages, etc) separately as they can be considered separete games within a composite game. Indeed, they actually are.
What you describe as "Live testing" is not a valid test strategy. One of the major requirements of testing is determinism; i.e. the same test suite should always produce exactly the same results for the same source code. This requirement also rules out "Beta testing" from being a formal test. By the way, before you object: there are ways of writing non-deterministic code that can be tested deterministically.
Also, in all my life as a Software Engineer, I have never heard of "launching some program to see if it can load older files" as a method for testing. Seriously, wtf? What kind of test is this? What is the scope of this test; as in, when this test fails, what do you say failed? The whole product? How do you automate this? In which company, university, institution do they accept this as a valid approach for Software Validation?
Finally, modern games are NOT marvels of engineering; they are often buggy as hell and unoptimised... There is some Software out there, particularly AI software like Google that can be considered a marvel. When it comes to games, it is pretty much just the hardware that does the magic; they have become so amazingly fast that nowadays they can run even some badly optimised sloppy code smoothly at 1080p. For a guy whose first computer had a state-of-the-art 2 MB RAM, hardware today is f*ing unbelievable