Detailed profiling data from a training and inference framework is shared, highlighting communication-computation overlap strategies with PyTorch Profiler visualizations. The framework implements DualPipe with MoE layers across different configurations, including EP64/TP1 for training and EP32/TP1 for prefilling, demonstrating balanced routing and micro-batch optimization techniques.

https://github.com/deepseek-ai/profile-data

#performanceprofiling #deeplearning #moearchitecture #pytorch #parallelcomputing

Reply to this note

Please Login to reply.

Discussion

No replies yet.