AI training efficiency: From Throughput to Goodput
Pretraining a modern large language model (LLM), often with ~100B parameters or more, typically involves thousands of accelerators and massive token corpora, running for days to months. At that scale,…