Word2vec

#Machine Learning #Embedding #Word2vec

Word2vec is a word embedding model that learns the probability of some words being neighbours in a sentence $p_{neighbours}(w_i, w_o)$.

Build a dataset of adjacent words. CBOW; skipgram; negative sampling;
Encode the words using vectors.
Build a model $f(\{\theta_i\})$ to calculate the probability of the words being neighours and improve the parameters $\{\theta_i\}$ using the dataset.

Planted: 2019-06-13 by L Ma;

References:

Dynamic Backlinks to wiki/machine-learning/embedding/word2vec:

Improving Document Ranking with Dual Word Embeddings

Word2vec produces two embedding spaces, the in-embedding and out-embedding.

wiki/machine-learning/embedding/word2vec Links to:

CBOW: Continuous Bag of Words

Use the context to predict the center word

skipgram: Continuous skip-gram

Use the center word to predict the context

Negative Sampling

negative sampling makes the calculations faster

L Ma (2019). 'Word2vec', Datumorphism, 06 April. Available at: https://datumorphism.leima.is/wiki/machine-learning/embedding/word2vec/.