Detailed profiling data from a training and inference framework is shared, highlighting communication-computation overlap strategies with PyTorch Profiler visualizations. The framework implements DualPipe with MoE layers across different configurations, including EP64/TP1 for training and EP32/TP1 for prefilling, demonstrating balanced routing and micro-batch optimization techniques.
https://github.com/deepseek-ai/profile-data
#performanceprofiling #deeplearning #moearchitecture #pytorch #parallelcomputing