# Negative Sampling

## #Word2vec

A naive model to train a model of words is to

- encode input words and output words using vectors,
- use the input word vector to predict the output word vector,
- calculate the errors between predicted output word vector and real output word vector,
- minimize the errors.

However, it is very expensive to prject out the output words and calcualte the error eveytime. A trick is to use **negative sampling**.

Negative sampling adds a new column to the data as the predictions.

Input (Center Word) | Output (Context) | Target (is Neighbour) |
---|---|---|

`intended` | `extravagant` | 1 |

`intended` | `display` | 1 |

`intended` | `to` | 1 |

`intended` | `attract` | 1 |

Now we have a problem. The target is always 1. This dataset might lead to network that outputs 1 all the time. We need some nagative samples to make it noisy. We randomly sampled words from the dictionary.

Input (Center Word) | Output (Context) | Target (is Neighbour) |
---|---|---|

`intended` | `extravagant` | 1 |

`intended` | `display` | 1 |

`intended` | `to` | 1 |

`intended` | `attract` | 1 |

`intended` | `I` | 0 |

`intended` | `a` | 0 |

`intended` | `intellect` | 0 |

`intended` | `mating` | 0 |

`intended` | `course` | 0 |

Published:
by Lei Ma;

Lei Ma (2020). 'Negative Sampling', Datumorphism, 01 April. Available at: https://datumorphism.leima.is/cards/machine-learning/embedding/negative-sampling/.

