
In GAN, the latent space input is usually random noise, e.g., Gaussian noise. The objective of is a very generic one. It doesn't say anything about how exactly the latent space will be used. This is not desirable in many problems. We would like to have more interpretability in the latent space. InfoGAN introduced constraints to the objective to enforce interpretability of the latent space1.


The constraint InfoGAN proposed is [[Mutual Information]] Mutual Information Mutual information is defined as $$ I(X;Y) = \mathbb E_{p_{XY}} \ln \frac{P_{XY}}{P_X P_Y}. $$ In the case that $X$ and $Y$ are independent variables, we have $P_{XY} = P_X P_Y$, thus $I(X;Y) = 0$. This makes sense as there would be no “mutual” information if the two variables are independent of each other. Entropy and Cross Entropy Mutual information is closely related to entropy. A simple decomposition shows that $$ I(X;Y) = H(X) - H(X\mid Y), $$ which is the reduction of … ,

$$ \underset{{\color{red}G}}{\operatorname{min}} \underset{{\color{green}D}}{\operatorname{max}} V_I ({\color{green}D}, {\color{red}G}) = V({\color{green}D}, {\color{red}G}) - \lambda I(c; {\color{red}G}(z,c)), $$


  • $c$ is the latent code,
  • $z$ is the random noise input,
  • $V({\color{green}D}, {\color{red}G})$ is the objective of GAN,
  • $I(c; {\color{red}G}(z,c))$ is the mutual information between the input latent code and generated data.

Using the lambda multiplier, we punish the model if the generator loses information in latent code $c$.


The training steps are almost the same as but with one extra loss to be calculated in each mini-batch.

  1. Train $\color{red}G$ using loss: $\operatorname{MSE}(v’, v)$;
  2. Train $\color{green}D$ using loss: $\operatorname{MSE}(v’, v)$;
  3. Apply Constraint:
    1. Sample data from mini-batch;
    2. Calculate loss $\lambda_{l} H(l’;l)+\lambda_c \operatorname{MSE}(c,c’)$



