What is Rotary Positional Embedding (RoPE)?
Answer
Rotary Positional Embedding (RoPE) is a positional encoding method that rotates query and key vectors in multi‑head attention by position‑dependent angles. This rotation naturally encodes relative positional information, improves generalization to longer contexts, and avoids the limitations of fixed or learned absolute positional embeddings. It is used in GPT-NeoX, LLaMA, PaLM, Qwen, etc.
It has below charactretidstics:
(1) Relative position encoding method for Transformers
(2) Applies rotation to query (Q) and key (K) vectors using position-dependent angles
(3) Encodes position via geometry, not by adding vectors
(4) Preserves relative distance naturally in dot-product attention
(5) Extrapolates well to longer sequences than the training length
RoPE rotates each 2D pair of hidden dimensions:
Where: represents the absolute position of the token in the sequence.
represents the base frequency/rotation angle.
represent the components of the embedding vector.
The below plot visualizes how RoPE makes attention decay smoothly with relative distance, while standard sinusoidal PE reflects absolute position similarity.









