Bin Size of Histogram

[[Histograms]] Histogram Suppose we check out the burger prices at the stores of Han im Glück, we get a list of numbers. We can arrange the numbers into bins of prices. For example, we can count the number stores that have a price between 10 to 11 euros. are good for understanding the distribution of your data.

The Bin Size Problem

As an example, we will use the following series as an example.

[1.45,2.20,0.75,1.23,1.25,1.25,3.09,1.99,2.00,0.78,1.32,2.25,3.15,3.85,0.52,0.99,1.38,1.75,1.21,1.75]

If we use bin size 1, we get this spiky chart and it is not so informing.

We could also set bin size to 2.

In principle, we could keep tuning the bin size until we get something pretty and informing. But that would be quite depressing.

Square-root

One simply way to estimate the number of bins needed is

$$B = \sqrt{N} ,$$

where $N$ is the lenght of the series.

In our example, $N=20$. Then we have $B=4.5\sim 5$ which leads to a bin size of $0.67$.

We immediately see the peak of this distribution.

Sturge’s formula

Sturges’ formula says that the number of bins of the histogram should be

$$ B = 1 + \log_2(N), $$

where $N$ is the lenght of the series.

In our example, $N=20$. We have $B = 5$. The max and min of our series are $3.85$ and $0.52$, thus we have the bin size $W = 0.67$ which is the same as the square-root method.

Scott’s Rule

Scott’s rule says we should choose bin width

$$W = 3.49 \sigma N^{-1/3}$$

In our case, we have $N=20$ and $\sigma=0.86$, which leads to $W=1.1$.

Planted: by ;

L Ma (2018). 'Bin Size of Histogram', Datumorphism, 11 April. Available at: https://datumorphism.leima.is/wiki/data-visualization/histogram-bin-size/.