skipgram: Continuous skip-gram
We use the following quote by Ford in Westworld as an example.
I read a theory once that the human intellect is like peacock feathers. Just an extravagant display intended to attract a mate, just an elaborate mating ritual. But, of course, the peacock can barely fly. It lives in the dirt, pecking insects out of the muck, consoling itself with its great beauty.
The word intended
is surrunded by extravagant display
in the front and to attract
after it. The task is to predict the probability of words around the middle word intended
, which are the ‘history words’ extravagant
, display
and ‘future words’ to
, attract
in our case. [mikolov2013]
For this center word intended
, we generate the following data.
Input (Center Word) | Output (Context) |
---|---|
intended | extravagant |
intended | display |
intended | to |
intended | attract |
We will build the following dataset using the sentence.
Input (Center Word) | Output (Context) |
---|---|
I | read |
I | a |
read | I |
read | a |
read | theory |
a | I |
a | read |
a | theory |
a | once |
theory | read |
theory | a |
theory | once |
… | … |
It is not required to use two in the front and two words after the middle word. The number of words to choose is a hyperparameter to be decided.
cards/machine-learning/embedding/continuous-skip-gram
:L Ma (2020). 'skipgram: Continuous skip-gram', Datumorphism, 01 April. Available at: https://datumorphism.leima.is/cards/machine-learning/embedding/continuous-skip-gram/.