Word Embedding Architecture
1. Training (Loss Optimization)
2. Extract Word Vector
3. Cosine Distance
INPUT LAYER
HIDDEN LAYER (100D)
OUTPUT LAYER (ŷ)
Word Vector [0.12, -0.4...]
Loss Function
Compare Output (ŷ)
with True Label (y)
L = -Σ y·log(ŷ)
(Cross-Entropy)
Optimize Weights (Backpropagation)
θ (Angle)
"sicherheit"
"kriminalität"
Cosine Similarity = cos(θ)
Small θ => High Similarity
Cosine Distance = 1 - cos(θ)