Forums or email seen better for long responses....
re: optxcollate as a separate opcode / auto hashing - I guess I'm thinking that [txcollate+sha256] could be added as a new opcode to existing tapscript without requiring varops budgeting to be applied to existing opcodes, so might be interesting to separate out and explore more immediately.
I think you could exhaust 4MB of stack with a tx spending 400 inputs each with a 10kB scriptPubKey ([1SUB DUP NOTIF p CHECKSIGVERIFY ENDIF]*250 or so for a 1 of 250 multisig maybe) and specifying input_scriptpubkey. Hashing as you go with collate would make that okay.
re: output order, "push a new element under .. last output will be on top" seems contradictory to me. The last/most recent element will be under everything which is the bottom? First/oldest output should be the top and first to be popped afaics.
You might consider moving the witness data to the end of the output and including a flag byte do that you can construct the wtxid correctly with only a single txcollate invocation.
I don't think scriptsig or witness in general is a good thing to introspection, it has too high a risk of making spending utxos together incompatible (already a problem with nlocktime). Pulling out any data requires knowing the script structure and execution path (and also requires DROPN afaics particularly if the other script is using optx or checkmultisig or opmulti), which is hard to get right and easy to get wrong, as occurred with the ctv-bitvm idea recently.
The intent with the annex is to apply some simple structure to it as consensus, so that pulling data from it can be done easily and safely. [X I ANNEX] to get the entry tagged X from the annex for input I (I=-1 for current input perhaps) would be a thought. That would imply the annex should only include a single entry per tag (not clear to me if multiple entries for a tag is desirable or not). I'm leaning towards individual annex entries being limited to perhaps 127 bytes - if you want longer, [SHA256 X -1 ANNEX EQUALVERIFY] lets you put the long thing on the witness while still committing to it.
One reason to limit how much you can introspect other inputs is the costing - in a coinjoin eg you can know in advance what the pubkey etc is but the witness could be arbitrarily large and examining it could increase your script's costs more than you might expect. That has some risk of an O(n^2) blow out, which isn't a consensus risk due to the budgeting, but still seems undesirable if it forces your costs to go up in unexpected ways that you can't really know until you see how the other guy is going to authorise.