Self-Attention Mechanism Visualization

Interactive visualization of self-attention in Transformers. Explore and understand how self-attention works in neural nets.

Self-Attention Mechanism Visualization

Hyperparameters

Model Dimension

Number of Heads

Dropout Rate

Max Sequence Length

Attention Visualization


Step-by-Step Calculation

Matrix Q is empty

Matrix K is empty

Matrix V is empty


Step 1: Calculate QK^T

QKT=QcdotKTQK^T = Q cdot K^T

Matrix QK^T is empty


Step 2: Scale QK^T

 extScaledQKT=\ racQKTdk\ ext{Scaled } QK^T = \ rac{QK^T}{\sqrt{d_k}}

Matrix Scaled QK^T is empty


Step 3: Apply Softmax

 extAttentionScores= extsoftmax( extScaledQKT)\ ext{Attention Scores} = \ ext{softmax}(\ ext{Scaled } QK^T)

Matrix Attention Scores is empty


Step 4: Multiply with V

 extOutput= extAttentionScoresV\ ext{Output} = \ ext{Attention Scores} \cdot V

Matrix Output is empty

Created with <3 by Jaber Jaber

View on GitHubStar