Blog

Jun 8, 2026 · 5 min read

Forecasting How Good a Language Model Will Be — Before You Evaluate It

Cross-entropy loss is smooth but task-agnostic; direct evaluation is expensive and often uninformative early in training. We propose a different signal — cheap proxy metrics calculated from a model's distribution over expert solutions. We use this to rank models, choose pretraining data, and forecast accuracy across training.