Chi square test in R Studio
1. What is Chi square test?
A chi-square (χ2) statistic is a test that is used to measure the expectations with actual observed data. The data used during the process must be random, raw, mutually exclusive, drawn from independent variables, and drawn from a large enough sample.
Chi square test is used for testing relationship between categorical variables. It is used to evaluate test of independence for bivariates using cross tabulation. The null hypothesis states that there is no any relationship between the selected categorical variables. An example of research question that could be answered using this test is given below:
Is there a significant relationship between gender and natural resource management they are doing ?
The formula for calculating chi square is given below:
where fo = the observed frequency (the observed counts in the cells)
and fe = the expected frequency if NO relationship existed between the variables
and fe = the expected frequency if NO relationship existed between the variables
In contrary to Chi square test, Fischer Exact test is used in testing the significance, if the sample size is less. However, it is valid for all sample size. It is the way to test the association between the two categorical variable when you have small cell sizes (expected count less than 5).
2. How to do these tests in R?
#File name is Chisq_oneway_2sav
#Null hypothesis: There is no significant relation between disease occurrence and breed of cattle.
Import and print data
Import and print data
import file
attach(filename)
library(MASS)
print(Chisq_oneway_2sav)
#Output is obtained as below:
# A tibble: 25 x 5
sample vaccinated disease_occurance Breed Dressing_prcnt
<dbl> <dbl+lbl> <dbl+lbl> <dbl+lbl> <dbl>
1 1 1 [vaccinated] 2 [no] 1 [ghatikhwile] 60
2 2 1 [vaccinated] 2 [no] 1 [ghatikhwile] 54
3 3 2 [non-vaccinated] 1 [yes] 1 [ghatikhwile] 67
4 4 2 [non-vaccinated] 1 [yes] 1 [ghatikhwile] 76
5 5 1 [vaccinated] 2 [no] 2 [sakini] 67
6 6 2 [non-vaccinated] 2 [no] 1 [ghatikhwile] 87
7 7 1 [vaccinated] 1 [yes] 1 [ghatikhwile] 87
8 8 2 [non-vaccinated] 2 [no] 1 [ghatikhwile] 77
9 9 2 [non-vaccinated] 2 [no] 3 [broiler] 87
10 10 1 [vaccinated] 2 [no] 3 [broiler] 75
# ... with 15 more rows
Perform Chi square test
Perform Chi square test
chisq.test(disease_occurance,Breed,correct = FALSE)
#Output is obtained as:
Pearson's Chi-squared test
data: disease_occurance and Breed
X-squared = 0.54113, df = 2, p-value = 0.763
#Interpretation: There is no significant association between disease occurrence and breed.
#See the warning message.
Warning message:
In chisq.test(disease_occurance, Breed, correct = FALSE) :
Chi-squared approximation may be incorrect
Or
chisq <- chisq.test(disease_occurance,Breed)
Warning message:
In chisq.test(disease_occurance, Breed) :
Chi-squared approximation may be incorrect
chisq
#Output is seen as :
Pearson's Chi-squared test
data: disease_occurance and Breed
X-squared = 0.54113, df = 2, p-value = 0.763
# If you see warning message in red color go for Fischer test
fisher.test(disease_occurance,Breed)
#See output
Fisher's Exact Test for Count Data
data: disease_occurance and Breed
p-value = 0.8763
alternative hypothesis: two.sided
#Interpretation: No Significant association between the variables
table(disease_occurance,Breed)
#See output
To see proportion, perform following command:
prop.table(table(disease_occurance,Breed))
#See output
Ballon plots: It is used to plot a graphical matrix where each cells contain a dot whose size reflects the relative magnitude of the corresponding component.
library(ggpubr)
ggballoonplot(Chisq_oneway_2sav)
#Output is obtained as :
#Or
library(ggpubr)
library(ggplot2)
theme_set(theme_pubr())
ggballoonplot(Chisq_oneway_2sav,fill="value")+scale_fill_viridis_c(option = "C")
#Output is obtained as:
or
library(gplots)
dt <- as.table(as.matrix(disease_occurance))
balloonplot(t(dt), main ="disease_occurance", xlab ="", ylab="",
label = FALSE, show.margins = FALSE)
Method to perform corrplot: It is the method to plot the graph of correlation matrix.
library(corrplot)
chisq$observed
# To see expected count and residuals with 2 and 3 values after decimal respectively perform following command.
corrplot(chisq$residuals, is.cor = FALSE)
#Interpretation: The column side is for breed and row is for disease occurrence. The blue color shows the positive association whereas red shows the negative. The value is in the figure. The increase in the intensity of color shows the strength.
contrib <- 100*chisq$residuals^2/chisq$statistic
round(contrib, 3)
#Output is shown as:
#Note: The contribution of a point to an axis depends both on the distance from the point to the origin point along the axis and on the weight of the point. The contributions of points to axes are the main aid to interpretation (see Le Roux and Rouanet, 2004 and 2010).
corrplot(contrib, is.cor = FALSE)
#Output is seen as
chisq$p.value
#Output: [1] 0.76295 (Not Significant)
chisq$estimate
#Output: Null
The video is shown in the video below:
Chi square test
The Link for the file I used is:
https://drive.google.com/file/d/1lY1OycG-t7ilkA8kj5QczFscSCF35jAu/view?usp=sharing
The Link for the file I used is:
https://drive.google.com/file/d/1lY1OycG-t7ilkA8kj5QczFscSCF35jAu/view?usp=sharing
1 Comments:
great job sir , very informative
Post a Comment
Subscribe to Post Comments [Atom]
<< Home