2026-01-19
Pretrain GPT-2 on 1 GPU
Make LLM pretraining accessible for everyone
This GitHub repo in an LLM that is accessible for anyone to pretrain. We aim to make LLM pretraining more and more accessible to everyone (not less and less, as common false expectation suggests).
Currently GPT-1 has already be replicated in this repo. For GPT-2 we just need to increase the data size by 10x, but some architectural improvements would be good.
Suggest your ideas, we need to empirically measure and validate improvements as described in the repo.