Wednesday, June 16, 2021

Non parametric test in R

 1. What is nonparametric test? 

Nonparametric tests  don't require distribution to meet the requirement of assumptions to be fulfilled. So, these tests are also known as distribution free tests. When one's data doesn't fulfill normality assumptions, it is recommended to do these tests. 




These tests are only to be done if, the assumptions for parametric tests are not fulfilled. If the population size is sufficiently large, we can still use parametric tests.

2. What are the cases to use nonparametric tests: 

a.  Skewed data: We can use parametric tests if its assumption of normal distribution and homogeneity of variance is satisfied. If the data are skewed, mean is no longer the best measure as it is affected by extreme values. In such case, data are better represented by median. 

b. If population size is too small. 

c. The analyzed data is either nominal or ordinal. 

d. When there are definite outliers. 

3. Types of non parametric test.

a. Mann Whitney U Test: It is nonparametric alternative to independent sample t test. 

b. Wilcox Signed Rank Test: It is non parametric counterpart of paired sample t test.

c. Kruskal Wallis Test: It is non parametric alternative to one way ANOVA.

Note: Please see the condition of using independent sample t test, paired sample t test and one way ANOVA in my previous articles. 


4. Tests in R 

a. Binomial test

Example: In an trial of a COVID vaccine , out of 100  studies 65 were effective. Their claim was of 80%. 

> binom.test (65,100,0.8)

Exact binomial test

data:  65 and 100

number of successes = 65, number of trials = 100, p-value = 0.0004141

alternative hypothesis: true probability of success is not equal to 0.8 (Significant difference noted) 

95 percent confidence interval:

 0.5481506 0.7427062

sample estimates:

probability of success 

                  0.65 

b. Wilcox signed rank test

File used: nonpara.xlsx https://drive.google.com/file/d/1dZWDJ9Sb309pgrUe2aLZUyGQsU4SFlxS/view?usp=sharing

Commands to follow: 

A. Import and attach file

B. Define factors 

> nonpara$Trt<-as.factor(nonpara$Trt)

> nonpara$mulcing<-as.factor(nonpara$mulcing)

C. Observe summary

> summary(nonpara)

      Trt     mulcing   yield_fefore    Yield_after   

 1      : 4   No :18   Min.   :1.780   Min.   :2.330  

 2      : 4   yes: 9   1st Qu.:1.780   1st Qu.:2.360  

 3      : 4   Yes: 9   Median :1.814   Median :2.405  

 4      : 4            Mean   :1.851   Mean   :2.440  

 5      : 4            3rd Qu.:1.864   3rd Qu.:2.473  

 6      : 4            Max.   :2.450   Max.   :2.910  

 (Other):12      

D. Check normality

1st way:

> shapiro.test(Yield_after)

Shapiro-Wilk normality test

data:  Yield_after

W = 0.78766, p-value = 9.35e-06

Interpretation: P value is less than 0.05 so it is not normal. 

2nd way 

> res.aov <- aov(Yield_after ~ Trt , data = nonpara)

> summary(res.aov) 

            Df Sum Sq Mean Sq F value  Pr(>F)    

Trt          8 0.4212 0.05265   16.21 1.8e-08 ***

Residuals   27 0.0877 0.00325                    

---

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

> model.tables(res.aov, type="means", se = TRUE)

Tables of means

Grand mean     

2.440389 

 Trt 
Trt
     1         2         3          4           5           6           7              8                   9 
2.4535 2.3555 2.5100 2.4443 2.4443  2.3555    2.3638       2.3343         2.7025 

Standard errors for differences of means
           Trt
        0.0403
replic.      4

> aov_residuals <- residuals(object = res.aov)
> shapiro.test(x = aov_residuals )

Shapiro-Wilk normality test

data:  aov_residuals
W = 0.8176, p-value = 3.715e-05

> plot(res.aov,2)

Note: As long as the points are close to dotted line we can assume normality. However we can see some of the values being away from it. 

See normal curve

> g=yield_fefore

> hist(g)

> m<-mean(g)

> std<-sqrt(var(g))

> hist(g, density=20, breaks=12, prob=TRUE,

 xlab=DS10, ylim=c(0, 9.5), 

 main="normal curve over histogram")



> curve(dnorm(x, mean=m, sd=std), 

col="red", lwd=2, add=TRUE, yaxt="n")


Note: data is positively skewed 

See homogeneity of variance 

> bartlett.test(yield_fefore~Trt)

Bartlett test of homogeneity of variances

data:  yield_fefore by Trt

Bartlett's K-squared = Inf, df = 8, p-value < 2.2e-16

Note: As the p value is less than 0.05. The variance are not homogenous. 

We can also use Levene test for it.

> leveneTest(yield_fefore~Trt, data = nonpara)

Levene's Test for Homogeneity of Variance (center = median)

      Df F value Pr(>F)

group  8  1.1062 0.3898

      27               

Note: Bartlett test is more robust than Levene test. 

As none of the test showed that the data are normal we can move on to non parametric test.

E. Wilcox sign rank test ( Alternative to pair sample t test) 

> wilcox.test(yield_fefore,Yield_after, paired = TRUE)

Wilcoxon signed rank test with continuity correction

data:  yield_fefore and Yield_after

V = 0, p-value = 1.712e-07

alternative hypothesis: true location shift is not equal to 0

Note: As p value is less than 0.05, significant difference was observed for yield before and yield after. 

F. Mann Whitney Wilcoxon test

 wilcox.test(yield_before~INM)

File used: t.xlsx https://drive.google.com/file/d/1DEFqFbemqia3FlEPyIPxeGQSIApbs8c_/view?usp=sharing

Wilcoxon rank sum test with continuity correction

data:  yield_before by INM

W = 0, p-value = 0.001723

Note: Significant difference in yield noted by practices of doing integrated nutrient management to that of not doing so as p<0.01. 

G. Kruskal Wallis test

File used: nonpara.xlsx 

> kruskal.test(yield_fefore~Trt, data=nonpara)

Kruskal-Wallis rank sum test

data:  yield_fefore by Trt

Kruskal-Wallis chi-squared = 28.844, df = 8, p-value = 0.0003377

Note: Significant difference was observed in mean yield with respect to treatment. 

Now for mean separation, use the following command. 

> library(PMCMR)

> library(PMCMRplus)

> posthoc.kruskal.dunn.test(x=yield_fefore, g=Trt, p.adjust.method = "bonferroni")

Pairwise comparisons using Dunn's-test for multiple

                         comparisons of independent samples

data: yield_fefore and Trt

  1              2          3        4         5         6           7           8     

2 1.0000 -      -      -      -      -      -      -     

3 1.0000 0.6591 -      -      -      -      -      -     

4 1.0000 1.0000 1.0000 -      -      -      -      -     

5 1.0000 1.0000 1.0000 1.0000 -      -      -      -     

6 1.0000 1.0000 0.6591 1.0000 1.0000 -      -      -     

7 0.3524 1.0000 0.1040 1.0000 1.0000 1.0000 -      -     

8 0.3524 1.0000 0.1040 1.0000 1.0000 1.0000 1.0000 -     

9 1.0000 0.0878 1.0000 1.0000 1.0000 0.0878 0.0094 0.0094

Note: Significance difference was seen between treatment 7 vs treatment 9 and 8 vs 9. 

H. Vann Waeren test 

> vanWaerden.test(x=yield_fefore,g=Trt)

Van der Waerden normal scores test

data:  yield_fefore and Trt

Van der Waerden chi-squared = 28.264, df = 8, p-value = 0.0004266

alternative hypothesis: true location shift is not equal to 0

Note: Significant difference was observed in this case. 

> posthoc.vanWaerden.test(x=yield_fefore,g=Trt,p.adjust.method="none")

Pairwise comparisons using van der Waerden normal scores test for

               multiple comparisons of independent samples

data: yield_fefore and Trt


  1                  2                 3               4              5                6                  7                 8      

2 0.00138   -                   -                -                -                 -                  -                 -      

3 0.23618  5.6e-05          -               -                -                 -                  -                 -      

4 0.09282  0.07945    0.00643        -               -                   -                 -                  -      

5 0.40986  0.01107    0.05033    0.37334          -                    -                 -                 -      

6 0.00138 1.00000    5.6e-05     0.07945      0.01107             -                -                  -      

7 6.0e-05   0.24680    2.3e-06    0.00566      0.00056        0.24680          -                  -      

8 6.0e-05 0.24680     2.3e-06     0.00566      0.00056        0.24680       1.00000           -      

9 0.01343    1.2e-06       0.16302     0.00016     0.00171     1.2e-06       5.9e-08      5.9e-08

Note: Significant observation can be noted between the treatments where I have made bold numbers. 







1 Comments:

At August 26, 2023 at 7:55 PM , Blogger Lakshya said...

Sir, Which non parametric test we should do for 2 factor CRD and How can we do it in R? And how to interpret and show that in table in paper?

 

Post a Comment

Subscribe to Post Comments [Atom]

<< Home