fGAN
The essence of [[GAN]] GAN The task of GAN is to generate features $X$ from some noise $\xi$ and class labels $Y$, $$\xi, Y \to X.$$ Many different GANs are proposed. Vanilla GAN has a simple structure with a single discriminator and a single generator. It uses the minmax game setup. However, it is not stable to use minmax game to train a GAN model. WassersteinGAN was proposed to solve the stability problem during training1. More advanced GANs like BiGAN and ALI have more complex structures. Vanilla GAN Minmax Game … is comparing the generated distribution $p_G$ and the data distribution $p_\text{data}$. The vanilla GAN considers the JensenShannon divergence $\operatorname{D}\text{JS}(p\text{data}\Vert p_{G})$. The discriminator ${\color{green}D}$ serves the purpose of forcing this divergence to be small.
There exists a more generic form of JS divergence, which is called [[fdivergence]] fDivergence The fdivergence is defined as1 $$ \operatorname{D}_f = \int f\left(\frac{p}{q}\right) q\mathrm d\mu, $$ where $p$ and $q$ are two densities and $\mu$ is a reference distribution. Requirements on the generating function The generating function $f$ is required to be convex, and $f(1) =0$. For $f(x) = x \log x$ with $x=p/q$, fdivergence is reduced to the KL divergence $$ \begin{align} &\int f\left(\frac{p}{q}\right) q\mathrm d\mu \\ =& \int \frac{p}{q} \log \left( \frac{p}{q} \right) \mathrm … ^{1}. fGAN obtains the model by estimating the fdivergence between the data distribution and the generated distribution^{2}.
Variational Divergence Minimization
The Variational Divergence Minimization (VDM) extends the variational estimation of fdivergence^{2}. VDM searches for the saddle point of an objective $F({\color{red}\theta}, {\color{blue}\omega})$, i.e., min w.r.t. $\theta$ and max w.r.t ${\color{blue}\omega}$, where ${\color{red}\theta}$ is the parameter set of the generator ${\color{red}Q_\theta}$, and ${\color{blue}\omega}$ is the parameter set of the variational approximation to estimate fdivergence, ${\color{blue}T_\omega}$.
The objective $F({\color{red}\theta}, {\color{blue}\omega})$ is related to the choice of $f$ in fdivergence and the variational functional ${\color{blue}T}$,
$$ \begin{align} & F(\theta, \omega)\\ =& \mathbb E_{x\sim p_\text{data}} \left[ {\color{blue}T_\omega}(x) \right]  \mathbb E_{x\sim {\color{red}Q_\theta} } \left[ f^*({\color{blue}T_\omega}(x)) \right] \\ =& \mathbb E_{x\sim p_\text{data}} \left[ g_f(V_{\color{blue}\omega}(x)) \right]  \mathbb E_{x\sim {\color{red}Q_\theta} } \left[ f^*(g_f(V_{\color{blue}\omega}(x))) \right]. \end{align} $$
In the above objective,
 $f^ *$ is the Legendre–Fenchel transformation of $f$, i.e., $f^ * (t) = \operatorname{sup} _ {u\in \mathrm{dom} _ f}\left\{ ut  f(u) \right\}$.
$T$, $g_f$, $V$
The function $T$ is used to estimate the lower bound of fdivergence[^Nowozin2016].
Nowozin et al provided a table for $g_f$ and $V$[^Nowozin2016].
We estimate
 $\mathbb E_{x\sim p_\text{data}}$ by sampling from the minibatch, and
 $\mathbb E_{x\sim {\color{red}Q_\theta} }$ by sampling from the generator.
Reduce to GAN
The VDM loss can be reduced to the [[loss of GAN]] GAN The task of GAN is to generate features $X$ from some noise $\xi$ and class labels $Y$, $$\xi, Y \to X.$$ Many different GANs are proposed. Vanilla GAN has a simple structure with a single discriminator and a single generator. It uses the minmax game setup. However, it is not stable to use minmax game to train a GAN model. WassersteinGAN was proposed to solve the stability problem during training1. More advanced GANs like BiGAN and ALI have more complex structures. Vanilla GAN Minmax Game … by setting^{2}
$$ \begin{align} \log {\color{green}D_\omega} =& g_f(V_{\color{blue}\omega}(x)) \\ \log \left( 1  {\color{green}D_\omega} \right) =& f^*\left( g_f(V_{\color{blue}\omega}(x)) \right). \end{align} $$
It is straightforward to validate that the following result is a solution to the above set of equations,
$$ g_f(V) = \log \frac{1}{1 + e^{V}}. $$

fdivergence_wiki Contributors to Wikimedia projects. Fdivergence. In: Wikipedia [Internet]. 17 Jul 2021 [cited 6 Sep 2021]. Available: https://en.wikipedia.org/wiki/Fdivergence#Instances_of_fdivergences ↩︎

Nowozin2016 Nowozin S, Cseke B, Tomioka R. fGAN: Training Generative Neural Samplers using Variational Divergence Minimization. arXiv [stat.ML]. 2016. Available: http://arxiv.org/abs/1606.00709 ↩︎
 Liu2020 Liu X, Zhang F, Hou Z, Wang Z, Mian L, Zhang J, et al. Selfsupervised Learning: Generative or Contrastive. arXiv [cs.LG]. 2020. Available: http://arxiv.org/abs/2006.08218
 Nowozin2016 Nowozin S, Cseke B, Tomioka R. fGAN: Training Generative Neural Samplers using Variational Divergence Minimization. arXiv [stat.ML]. 2016. Available: http://arxiv.org/abs/1606.00709
 fdivergence_wiki Contributors to Wikimedia projects. Fdivergence. In: Wikipedia [Internet]. 17 Jul 2021 [cited 6 Sep 2021]. Available: https://en.wikipedia.org/wiki/Fdivergence#Instances_of_fdivergences
 convex_conjugate_wiki Contributors to Wikimedia projects. Convex conjugate. In: Wikipedia [Internet]. 20 Feb 2021 [cited 7 Sep 2021]. Available: https://en.wikipedia.org/wiki/Convex_conjugate
L Ma (2021). 'fGAN', Datumorphism, 08 April. Available at: https://datumorphism.leima.is/wiki/machinelearning/adversarialmodels/fgan/.