# Chi-square Correlation Test for Nominal Data

In this article, we will discuss the chi-square correlation test for detecting correlations between two series.

## Steps

- Find out all the possible values of the two nominal series A and B;
- Count the co-occurrences of the combinations (A, B);
- Calculate the expected co-occurrences of the combinations (A, B);
- Calculate chi-square;
- Determine whether the hypothesis can be rejected.

## Define the Series

Suppose we are analyzing two series A and B. Series A can take values $a_1$ and $a_2$, while series B can take values $b_1$ , $b_2$ and $b_3$.

$$ \begin{align} A &:= \{a1, a2\} \\ B &:= \{b1,b2,b3\} \end{align} $$

As an example, we will use the following A and B series for our calculations in this article.

index | A | B |
---|---|---|

1 | a1 | b2 |

2 | a1 | b2 |

3 | a1 | b1 |

4 | a2 | b1 |

5 | a2 | b3 |

6 | a2 | b2 |

7 | a1 | b2 |

8 | a2 | b2 |

## Count Co-ocurrences

To analyze correlations between the two series, we need to look at whether the values of series A and those of series B would occur together. For example, we would like to know the possibility of values for B if we have $a_1$ occurred.

Now we construct a contigency table to denote the ocurrences of the values, (A, B).

a1 | a2 | |
---|---|---|

b1 | 1 | 1 |

b2 | 3 | 2 |

b3 | 0 | 1 |

where the cells are filled with the number of occurrences of the corresponding combinations. For example, the combination (a1, b1) occurred once, thus 1 in the first row first column.

This table records the **observed frequencies**, which we denote as table O and each cell is denoted as $o_{ij}$.

Pearson’s chi-square correlation is a smart idea.

First of all, we define an expectation table E. Each element of E is calculated as

$$ e_{ij} = \frac{ \text{number of } a_i * \text{ number of } b_j }{ \text{ total number of rows in original table } } $$

Now if we compare the original table with this one,

$$ o_{ij} - e_{ij} $$

we get the deviation from the expected table. With a few little twitches, we would define

$$ \chi^2 = \sum_{i,j} \frac{ (o_{ij} - e_{ij})^2 } { e_{ij} } $$

## How to Use the Number Chi-square

The final question is how to use the result. We usually have a threshold $\chi_0^2$. Whenever our calculated value is larger than this one, we decide that our analysis rejects the hypothesis that the two columns are correlated. This value $\chi_0^2$ can be found in the textbooks.

## Other Methods

L Ma (2018). 'Chi-square Correlation Test for Nominal Data', Datumorphism, 11 April. Available at: https://datumorphism.leima.is/wiki/statistics/correlation-analysis-chi-square/.