Self-Attention Mechanism Visualization

Interactive visualization of self-attention in Transformers. Explore and understand how self-attention works in neural nets.

Hyperparameters

Model Dimension

Number of Heads

Dropout Rate

Max Sequence Length

Matrix Q is empty

Matrix K is empty

Matrix V is empty

QK^T = Q cdot K^T

Matrix QK^T is empty

\ ext{Scaled } QK^T = \ rac{QK^T}{\sqrt{d_k}}

Matrix Scaled QK^T is empty

\ ext{Attention Scores} = \ ext{softmax}(\ ext{Scaled } QK^T)

Matrix Attention Scores is empty

\ ext{Output} = \ ext{Attention Scores} \cdot V

Matrix Output is empty

Created with <3 by Jaber Jaber