nostr:npub1g0flx5zvtsh7c0mjqlzjmqr6djmfn054srpneytg07whhx33s45s95mq4z
> It works especially well on GPUs, and it doesn't require use of CUDA/cuDNN on Nvidia hardware, while achieving comparable performance.
This is very good to hear.
The problem with most of the alternative implementations nowadays (like Triton) is that they're just thin layers on top of CUDA, so they aren't real "alternatives".