Word Embedding Architecture

INPUT LAYER HIDDEN LAYER (100D) OUTPUT LAYER (ŷ) Word Vector [0.12, -0.4...] Loss Function Compare Output (ŷ) with True Label (y) L = -Σ y·log(ŷ) (Cross-Entropy) Optimize Weights (Backpropagation) θ (Angle) "sicherheit" "kriminalität" Cosine Similarity = cos(θ) Small θ => High Similarity Cosine Distance = 1 - cos(θ)