DL0052 Rotary Positional Embedding

What is Rotary Positional Embedding (RoPE)?

Answer

Rotary Positional Embedding (RoPE) is a positional encoding method that rotates query and key vectors in multi‑head attention by position‑dependent angles. This rotation naturally encodes relative positional information, improves generalization to longer contexts, and avoids the limitations of fixed or learned absolute positional embeddings. It is used in GPT-NeoX, LLaMA, PaLM, Qwen, etc.
It has below charactretidstics:
(1) Relative position encoding method for Transformers
(2) Applies rotation to query (Q) and key (K) vectors using position-dependent angles
(3) Encodes position via geometry, not by adding vectors
(4) Preserves relative distance naturally in dot-product attention
(5) Extrapolates well to longer sequences than the training length

RoPE rotates each 2D pair of hidden dimensions:
f(x, m)=\begin{pmatrix}\cos(m\theta) & -\sin(m\theta) \\ \sin(m\theta) & \cos(m\theta)\end{pmatrix}\begin{pmatrix}x_1 \\x_2\end{pmatrix}
Where:
 m represents the absolute position of the token in the sequence.
 \theta represents the base frequency/rotation angle.
 x_1, x_2 represent the components of the embedding vector.

The below plot visualizes how RoPE makes attention decay smoothly with relative distance, while standard sinusoidal PE reflects absolute position similarity.


Login to view more content

Did you solve the problem?

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *