Chi-square Correlation Test for Nominal Data
In this article, we will discuss the chi-square correlation test for detecting correlations between two series.
Steps
- Find out all the possible values of the two nominal series A and B;
- Count the co-occurrences of the combinations (A, B);
- Calculate the expected co-occurrences of the combinations (A, B);
- Calculate chi-square;
- Determine whether the hypothesis can be rejected.
Define the Series
Suppose we are analyzing two series A and B. Series A can take values $a_1$ and $a_2$, while series B can take values $b_1$ , $b_2$ and $b_3$.
$$ \begin{align} A &:= \{a1, a2\} \\ B &:= \{b1,b2,b3\} \end{align} $$
As an example, we will use the following A and B series for our calculations in this article.
index | A | B |
---|---|---|
1 | a1 | b2 |
2 | a1 | b2 |
3 | a1 | b1 |
4 | a2 | b1 |
5 | a2 | b3 |
6 | a2 | b2 |
7 | a1 | b2 |
8 | a2 | b2 |
Count Co-ocurrences
To analyze correlations between the two series, we need to look at whether the values of series A and those of series B would occur together. For example, we would like to know the possibility of values for B if we have $a_1$ occurred.
Now we construct a contigency table to denote the ocurrences of the values, (A, B).
a1 | a2 | |
---|---|---|
b1 | 1 | 1 |
b2 | 3 | 2 |
b3 | 0 | 1 |
where the cells are filled with the number of occurrences of the corresponding combinations. For example, the combination (a1, b1) occurred once, thus 1 in the first row first column.
This table records the observed frequencies, which we denote as table O and each cell is denoted as $o_{ij}$.
Pearson’s chi-square correlation is a smart idea.
First of all, we define an expectation table E. Each element of E is calculated as
$$ e_{ij} = \frac{ \text{number of } a_i * \text{ number of } b_j }{ \text{ total number of rows in original table } } $$
Now if we compare the original table with this one,
$$ o_{ij} - e_{ij} $$
we get the deviation from the expected table. With a few little twitches, we would define
$$ \chi^2 = \sum_{i,j} \frac{ (o_{ij} - e_{ij})^2 } { e_{ij} } $$
How to Use the Number Chi-square
The final question is how to use the result. We usually have a threshold $\chi_0^2$. Whenever our calculated value is larger than this one, we decide that our analysis rejects the hypothesis that the two columns are correlated. This value $\chi_0^2$ can be found in the textbooks.
Other Methods
wiki/statistics/correlation-analysis-chi-square
:wiki/statistics/correlation-analysis-chi-square
Links to:L Ma (2018). 'Chi-square Correlation Test for Nominal Data', Datumorphism, 11 April. Available at: https://datumorphism.leima.is/wiki/statistics/correlation-analysis-chi-square/.