Research

Research Research

T-Free: Hierarchical Autoregressive Transformers for Language Fairness and Sovereignty

In this blog post, we want to take a closer look at a tokenizer-free approach, which we proposed in a recent paper and termed Hierarchical Autoregressive Transformers (HAT). In particular, we want to showcase how such a model can be pre-trained in English and efficiently adapted to learn a new, previously unseen language.
Mehr lesen
Research

In awe at the scale of these tensors – a gentle introduction to Unit-Scaled Maximal Update Parametrization

Together with Graphcore, we recently developed u-μP as a new paradigm to parametrize neural networks in terms of width and depth. Our approach combines μP, developed by G. Yang et. al., with Unit Scaling, a concept introduced by Graphcore.
Mehr lesen