AI

Some notes on how attention heads in a transformer model develop through training, are used in the model and combined to provide final weights.