Continuous batch enables 23x throughput in LLM inference and reduce p50 latency - https://www.anyscale.com/blog/continuous-batching-llm-inference

Reply to this note

Please Login to reply.

Discussion

No replies yet.