Blog

Short writing on the kinds of engineering problems I want to spend more time on

The Hidden Pipeline Behind LLM Loading

That one line of code that loads a model can take minutes. Here is everything that happens behind it: resolving the model ID, downloading sharded weights, building the architecture, and placing tensors on the GPU.