Of course you cannot test this game, or any other piece of Software only with unit tests, as those only test units. You also need integration tests and finally various system tests, including end-to-end tests. However, I think "live testing" as he described involves running the entire software manually.
Besides, an end-to-end test should never test the entire system (even though they are black-box system tests), they should only test relevant functionality. I.e. an end-to-end test should fail if and only if a component directly related to that functionality fails. Therefore, unless your software only has a single functionality; you should not be running your entire software for any end-to-end test.
Outside some small points we pretty much agree on everything, including why and how TW failed to deliver a better game. I am not sure about the manpower bit because I do not know how many they are; but they had so many years. I believe the biggest issue, however, is know-how. Fundamental principles of SE have not been followed for this project. It is clear as day to me.
By the way, the comparison between AlphaGo and M&B came from the argument that "M&B is a marvel of Software Engineering", which led me to give an example for an actual marvel there. Otherwise, of course I know they are not comparable by any means

I mean imagine I dig a hole in my backyard and call it a marvel of Civil Engineering, and then you sarcastically ask "how about Eiffel Tower?" and I respond "sorry but it doesn't make sense to compare a ditch to a tower." Would you admit that the hole is actually a marvel?