Negative Sampling
Knowledge of [[CBOW]] CBOW: Continuous Bag of Words Use the context to predict the center word or [[skipgram]] skipgram: Continuous skip-gram Use the center word to predict the context is required.
A naive model to train a model of words is to
- encode input words and output words using vectors,
- use the input word vector to predict the output word vector,
- calculate the errors between predicted output word vector and real output word vector,
- minimize the errors.
However, it is very expensive to project out the output words and calculate the error every time. A trick is to use negative sampling.
Negative sampling adds a new column to the data as the predictions.
Input (Center Word) | Output (Context) | Target (is Neighbour) |
---|---|---|
intended | extravagant | 1 |
intended | display | 1 |
intended | to | 1 |
intended | attract | 1 |
Now we have a problem. The target is always 1. This dataset might lead to network that outputs 1 all the time. We need some nagative samples to make it noisy. We randomly sampled words from the dictionary.
Input (Center Word) | Output (Context) | Target (is Neighbour) |
---|---|---|
intended | extravagant | 1 |
intended | display | 1 |
intended | to | 1 |
intended | attract | 1 |
intended | I | 0 |
intended | a | 0 |
intended | intellect | 0 |
intended | mating | 0 |
intended | course | 0 |
For more rigorous derivations, please follow Goldberg20141.
- mikolov2013 Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv:1301.3781
- Goldberg2014 Goldberg Y, Levy O. word2vec Explained: deriving Mikolov et al.’s negative-sampling word-embedding method. arXiv [cs.CL]. 2014. Available: http://arxiv.org/abs/1402.3722
- The Illustrated Word2vec
cards/machine-learning/embedding/negative-sampling
:cards/machine-learning/embedding/negative-sampling
Links to:L Ma (2020). 'Negative Sampling', Datumorphism, 01 April. Available at: https://datumorphism.leima.is/cards/machine-learning/embedding/negative-sampling/.