Attention Embedding directions are semantics eg one direction might be gender # Softmax $ probability_n = e^{x_1}/\sum_{n=0}^{N-1}{e^n} $ Logits ($x_1$)-- raw input to softmax, unnormalized, not a probability distribution. Probabilities -- output of the softmax