The regular Transformer may have just been unseated as the de facto architecture for LLMs. I’m no expert but Diff Transformers seem like a massive improvement over regular Transformers.

https://arxiv.org/abs/2410.05258

Reply to this note

Please Login to reply.

Discussion

No replies yet.