subodh: Non parametric test in R

1. What is nonparametric test?

Nonparametric tests don't require distribution to meet the requirement of assumptions to be fulfilled. So, these tests are also known as distribution free tests. When one's data doesn't fulfill normality assumptions, it is recommended to do these tests.

These tests are only to be done if, the assumptions for parametric tests are not fulfilled. If the population size is sufficiently large, we can still use parametric tests.

2. What are the cases to use nonparametric tests:

a. Skewed data: We can use parametric tests if its assumption of normal distribution and homogeneity of variance is satisfied. If the data are skewed, mean is no longer the best measure as it is affected by extreme values. In such case, data are better represented by median.

b. If population size is too small.

c. The analyzed data is either nominal or ordinal.

d. When there are definite outliers.

3. Types of non parametric test.

a. Mann Whitney U Test: It is nonparametric alternative to independent sample t test.

b. Wilcox Signed Rank Test: It is non parametric counterpart of paired sample t test.

c. Kruskal Wallis Test: It is non parametric alternative to one way ANOVA.

Note: Please see the condition of using independent sample t test, paired sample t test and one way ANOVA in my previous articles.

4. Tests in R

a. Binomial test

Example: In an trial of a COVID vaccine , out of 100 studies 65 were effective. Their claim was of 80%.

> binom.test (65,100,0.8)

Exact binomial test

data: 65 and 100

number of successes = 65, number of trials = 100, p-value = 0.0004141

alternative hypothesis: true probability of success is not equal to 0.8 (Significant difference noted)

95 percent confidence interval:

0.5481506 0.7427062

sample estimates:

probability of success

0.65

b. Wilcox signed rank test

File used: nonpara.xlsx https://drive.google.com/file/d/1dZWDJ9Sb309pgrUe2aLZUyGQsU4SFlxS/view?usp=sharing

Commands to follow:

A. Import and attach file

B. Define factors

> nonpara$Trt<-as.factor(nonpara$Trt)

> nonpara$mulcing<-as.factor(nonpara$mulcing)

C. Observe summary

> summary(nonpara)

Trt mulcing yield_fefore Yield_after

1 : 4 No :18 Min. :1.780 Min. :2.330

2 : 4 yes: 9 1st Qu.:1.780 1st Qu.:2.360

3 : 4 Yes: 9 Median :1.814 Median :2.405

4 : 4 Mean :1.851 Mean :2.440

5 : 4 3rd Qu.:1.864 3rd Qu.:2.473

6 : 4 Max. :2.450 Max. :2.910

(Other):12

D. Check normality

1st way:

> shapiro.test(Yield_after)

Shapiro-Wilk normality test

data: Yield_after

W = 0.78766, p-value = 9.35e-06

Interpretation: P value is less than 0.05 so it is not normal.

2nd way

> res.aov <- aov(Yield_after ~ Trt , data = nonpara)

> summary(res.aov)

Df Sum Sq Mean Sq F value Pr(>F)

Trt 8 0.4212 0.05265 16.21 1.8e-08 ***

Residuals 27 0.0877 0.00325

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

> model.tables(res.aov, type="means", se = TRUE)

Tables of means

Grand mean

2.440389

Trt

1 2 3 4 5 6 7 8 9

2.4535 2.3555 2.5100 2.4443 2.4443 2.3555 2.3638 2.3343 2.7025

Standard errors for differences of means

Trt

0.0403

replic. 4

> aov_residuals <- residuals(object = res.aov)

> shapiro.test(x = aov_residuals )

Shapiro-Wilk normality test

data: aov_residuals

W = 0.8176, p-value = 3.715e-05

> plot(res.aov,2)

Note: As long as the points are close to dotted line we can assume normality. However we can see some of the values being away from it.

See normal curve

> g=yield_fefore

> hist(g)

> m<-mean(g)

> std<-sqrt(var(g))

> hist(g, density=20, breaks=12, prob=TRUE,

xlab=DS10, ylim=c(0, 9.5),

main="normal curve over histogram")

> curve(dnorm(x, mean=m, sd=std),

col="red", lwd=2, add=TRUE, yaxt="n")

Note: data is positively skewed

See homogeneity of variance

> bartlett.test(yield_fefore~Trt)

Bartlett test of homogeneity of variances

data: yield_fefore by Trt

Bartlett's K-squared = Inf, df = 8, p-value < 2.2e-16

Note: As the p value is less than 0.05. The variance are not homogenous.

We can also use Levene test for it.

> leveneTest(yield_fefore~Trt, data = nonpara)

Levene's Test for Homogeneity of Variance (center = median)

Df F value Pr(>F)

group 8 1.1062 0.3898

Note: Bartlett test is more robust than Levene test.

As none of the test showed that the data are normal we can move on to non parametric test.

E. Wilcox sign rank test ( Alternative to pair sample t test)

> wilcox.test(yield_fefore,Yield_after, paired = TRUE)

Wilcoxon signed rank test with continuity correction

data: yield_fefore and Yield_after

V = 0, p-value = 1.712e-07

alternative hypothesis: true location shift is not equal to 0

Note: As p value is less than 0.05, significant difference was observed for yield before and yield after.

F. Mann Whitney Wilcoxon test

wilcox.test(yield_before~INM)

File used: t.xlsx https://drive.google.com/file/d/1DEFqFbemqia3FlEPyIPxeGQSIApbs8c_/view?usp=sharing

Wilcoxon rank sum test with continuity correction

data: yield_before by INM

W = 0, p-value = 0.001723

Note: Significant difference in yield noted by practices of doing integrated nutrient management to that of not doing so as p<0.01.

G. Kruskal Wallis test

File used: nonpara.xlsx

> kruskal.test(yield_fefore~Trt, data=nonpara)

Kruskal-Wallis rank sum test

data: yield_fefore by Trt

Kruskal-Wallis chi-squared = 28.844, df = 8, p-value = 0.0003377

Note: Significant difference was observed in mean yield with respect to treatment.

Now for mean separation, use the following command.

> library(PMCMR)

> library(PMCMRplus)

> posthoc.kruskal.dunn.test(x=yield_fefore, g=Trt, p.adjust.method = "bonferroni")

Pairwise comparisons using Dunn's-test for multiple

comparisons of independent samples

data: yield_fefore and Trt

1 2 3 4 5 6 7 8

2 1.0000 - - - - - - -

3 1.0000 0.6591 - - - - - -

4 1.0000 1.0000 1.0000 - - - - -

5 1.0000 1.0000 1.0000 1.0000 - - - -

6 1.0000 1.0000 0.6591 1.0000 1.0000 - - -

7 0.3524 1.0000 0.1040 1.0000 1.0000 1.0000 - -

8 0.3524 1.0000 0.1040 1.0000 1.0000 1.0000 1.0000 -

9 1.0000 0.0878 1.0000 1.0000 1.0000 0.0878 0.0094 0.0094

Note: Significance difference was seen between treatment 7 vs treatment 9 and 8 vs 9.

H. Vann Waeren test

> vanWaerden.test(x=yield_fefore,g=Trt)

Van der Waerden normal scores test

data: yield_fefore and Trt

Van der Waerden chi-squared = 28.264, df = 8, p-value = 0.0004266

alternative hypothesis: true location shift is not equal to 0

Note: Significant difference was observed in this case.

> posthoc.vanWaerden.test(x=yield_fefore,g=Trt,p.adjust.method="none")

Pairwise comparisons using van der Waerden normal scores test for

multiple comparisons of independent samples

data: yield_fefore and Trt

1 2 3 4 5 6 7 8

2 0.00138 - - - - - - -

3 0.23618 5.6e-05 - - - - - -

4 0.09282 0.07945 0.00643 - - - - -

5 0.40986 0.01107 0.05033 0.37334 - - - -

6 0.00138 1.00000 5.6e-05 0.07945 0.01107 - - -

7 6.0e-05 0.24680 2.3e-06 0.00566 0.00056 0.24680 - -

8 6.0e-05 0.24680 2.3e-06 0.00566 0.00056 0.24680 1.00000 -

9 0.01343 1.2e-06 0.16302 0.00016 0.00171 1.2e-06 5.9e-08 5.9e-08

Note: Significant observation can be noted between the treatments where I have made bold numbers.

subodh

Wednesday, June 16, 2021

Non parametric test in R

1 Comments:

Post a Comment

About Me

Previous Posts