Negative Sampling
#Word2vec
A naive model to train a model of words is to
 encode input words and output words using vectors,
 use the input word vector to predict the output word vector,
 calculate the errors between predicted output word vector and real output word vector,
 minimize the errors.
However, it is very expensive to prject out the output words and calcualte the error eveytime. A trick is to use negative sampling.
Negative sampling adds a new column to the data as the predictions.
Input (Center Word)  Output (Context)  Target (is Neighbour) 

intended 
extravagant 
1 
intended 
display 
1 
intended 
to 
1 
intended 
attract 
1 
Now we have a problem. The target is always 1. This dataset might lead to network that outputs 1 all the time. We need some nagative samples to make it noisy. We randomly sampled words from the dictionary.
Input (Center Word)  Output (Context)  Target (is Neighbour) 

intended 
extravagant 
1 
intended 
display 
1 
intended 
to 
1 
intended 
attract 
1 
intended 
I 
0 
intended 
a 
0 
intended 
intellect 
0 
intended 
mating 
0 
intended 
course 
0 
Published:
by L Ma;
L Ma (2020). 'Negative Sampling', Datumorphism, 01 April. Available at: https://datumorphism.leima.is/cards/machinelearning/embedding/negativesampling/.
Current Ref:

cards/machinelearning/embedding/negativesampling.md