A modular, research-friendly Transformer architecture for building and experimenting with small-scale language models.
Jan 1, 0001