Minimum Description Length ( ) can be used to construct a concise network. A fully connected network has great expressing power but it is easily overfitting.

One strategy is to apply constraints to the networks:

• Limit the connections;
• Shared weights in subgroups of the network;
• Constrain the weights using some probability distributions.

By minimizing the MDL of the network and the misfits on the data, we can build a concise network. Based on the , we can encode the misfit and the model using Shannon information content 1. The description length for the misfit and the model corresponds to the Shannon information content. Thus we can define an expected description length and minimize it in the model so that we can balance the complexity of the model and the goodness of fit.

Planted: by ;