Artificial Neural Networks
#Machine Learning #Artificial Neural Networks #Basics
Artificial neural networks works pretty well for solving some differential equations.
Universal Approximators
Maxwell Stinchcombe and Halber White proved that no theoretical constraints for the feedforward networks to approximate any measurable function. In principle, one can use feedforward networks to approximate measurable functions to any accuracy.
However, the convergence slows down if we have a lot of hidden units. There is a balance between accuracy and convergence rate. More hidden units lead to slow convergence but more accuracy.
Here is a quick review of the history of this topic.

Cybenko 1989
Cybenko proved that
$$ \sum_k v_k \sigma(w_k x + u_k) $$
is a good approximation of continuous functions because it is dense in continuous function space. In this result, $\sigma$ is a continuous sigmoidal function and the parameters are real.

Hornik 1989
“Single hidden layer feedforward networks can approximate any measurable functions arbitrarily well regardless of the activation function, the dimension of the input and the input space environment.”
Reference: http://deeplearning.cs.cmu.edu/notes/Sonia_Hornik.pdf
Activation Functions
There are many activation functions.
 UniPolar Sigmoid Function UniPolar Sigmoid Unipolar sigmoid function and its properties
 Bipolar Sigmoid Function BiPolar Sigmoid BiPolar sigmoid function and its properties
 Hyperbolic Tangent Hyperbolic Tanh Tanh function and its properties
 Radial Basis Function Radial Basis Function Radial Basis Function function and its properties
 Conic Section Function Conic Section Function Conic Section Function and its properties
 ReLu ReLu Rectified Linear Unit, aka ReLu, and its properties
 Leaky ReLu Leaky ReLu Leaky ReLu and its properties
 ELU ELU ELU and its properties
 Swish Swish Swish and its properties
Solving Differential Equations
The problem here to solve is
$$ \frac{d}{dt}y(t)=  y(t), $$
with initial condition $y(0)=1$.
To construct a single layered neural network, the function is decomposed using
$$ \begin{align} y(t_i) & = y(t_0) + t_i v_k f(t_i w_k+u_k) \\ &= 1+t_i v_k f(t_i w_k+u_k) , \end{align} $$
where $y(t_0)$ is the initial condition and $k$ is summed over.
Presumably this should be the gate controlling trigering of the neuron or not. Therefore the following expit function serves this purpose well,
$$ f(x) = \frac{1}{1+\exp(x)}. $$
One important reason for choosing this is that a lot of expressions can be calculated analytically and easily.
With the form of the function to be solved, we can define a cost
$$ I=\sum_i\left( \frac{dy}{dt}(t_i)+y(t_i) \right)^2, $$
which should be minimized to 0 if our structure of networks is optimized for this problem.
Now the task becomes clear:
 Write down the cost analytically;
 Minimized cost to find structure;
 Substitute back to the function and we are done.
Overfitting
It is possible that we could over fit a network so that it works only for the training data. To avoid that, people use several strategies.
 Split data into two parts, one for training and one for testing. A youtube video
 Throw more data in. At least 10 times as many as examples as the DoFs of the model. A youtube video
 Regularization by plugin a artificial term to the cost function, as an example we could add the . A youtube video
Neural Network and Finite Element Method
We consider the solution to a differential equation
$$ \mathcal L \psi  f = 0. $$
Neural network is quite similar to finite element method. In terms of finite element method, we can write down a neural network structured form of a function ^{1}
$$ \psi(x_i) = A(x_i) + F(x_i, \mathcal N_i), $$
where $\mathcal N$ is the neural network structure. Specifically,
$$ \mathcal N_i = \sigma( w_{ij} x_j + u_i ). $$
The function is parameterized using the network. Such parameterization is similar to collocation method in finite element method, where multiple basis is used for each location.
One of the choices of the function $F$ is a linear combination,
$$ F(x_i, \mathcal N_i) = x_i \mathcal N_i, $$
and $A(x_i)$ should take care of the boundary condition.
With such parameterization, the differential equation itself is parameterized such that
$$ \mathcal L \psi  f = 0, $$
such that the minimization should be
$$ \lvert \mathcal L \psi  f \rvert^2 \to 0 $$
at each point.

Freitag, K. J. (2007). Neural networks and differential equations. ↩︎
L Ma (2018). 'Artificial Neural Networks', Datumorphism, 11 April. Available at: https://datumorphism.leima.is/wiki/machinelearning/neuralnetworks/artificialneuralnetworks/.
Table of Contents
References:
 Hassoun MH, Assistant Professor of Computer Engineering Mohamad H Hassoun. Fundamentals of Artificial Neural Networks. MIT Press; 1995. Available: https://mitpress.mit.edu/books/fundamentalsartificialneuralnetworks
 Shenouda EAMA. A Quantitative Comparison of Different MLP Activation Functions in Classification. Advances in Neural Networks  ISNN 2006. Springer Berlin Heidelberg; 2006. pp. 849–857. doi:10.1007/11759966_125
 Hornik, K., Stinchcombe, M., & White, H. (1989). Multilayer feedforward networks are universal approximators. Neural Networks, 2(5), 359–366.
 Cybenko, G. (1989). Approximation by superpositions of a sigmoidal function. Mathematics of Control, Signals, and Systems, 2(4), 303–314.
 Freitag, K. J. (2007). Neural networks and differential equations.
 Tensorflow and deep learning  without a PhD by Martin Görner
 Kolmogorov, A. N. (1957). On the Representation of Continuous Functions of Several Variables by Superposition of Continuous Functions of one Variable and Addition, Doklady Akademii. Nauk USSR, 114, 679681.
 Maxwell Stinchcombe, Halbert White (1989). Multilayer feedforward networks are universal approximators. Neural Networks, Vol 2, 5, 359366.
 Performance Analysis of Various Activation Functions in Generalized MLP Architectures of Neural Networks
 Lippe Lippe P. Tutorial 3: Activation Functions — UvA DL Notebooks v1.1 documentation. In: UvA Deep Learning Tutorials [Internet]. [cited 23 Sep 2021]. Available: https://uvadlcnotebooks.readthedocs.io
Current Ref:

wiki/machinelearning/neuralnetworks/artificialneuralnetworks.md
Links from:
Learning Rate
Find a good learning rate
Initialize Artificial Neural Networks
Initialize a neural network is important for the training and performance. Some initializations …
The logsumexp Trick
For numerical stability we can use the logsumexp trick to calculate some loss such as cross …
McCullochPitts Model
Artificial neuron that separates the state space
Rosenblatt's Perceptron
Connected perceptrons
BiPolar Sigmoid
BiPolar sigmoid function and its properties
Conic Section Function
Conic Section Function and its properties
ELU
ELU and its properties
Hyperbolic Tanh
Tanh function and its properties
Leaky ReLu
Leaky ReLu and its properties
Radial Basis Function
Radial Basis Function function and its properties
ReLu
Rectified Linear Unit, aka ReLu, and its properties
Swish
Swish and its properties
UniPolar Sigmoid
Unipolar sigmoid function and its properties