Monday, July 20, 2020

Standard deviation and standard error of mean

What is standard deviation?

Standard deviation is quite frequently used in statistics. It is the measure of amount of dispersion or scatter of a set of values around the mean. If we obtain low standard deviation, it shows that most of the values are close to the mean. But in contrary, high standard deviation indicates that the values are spread over a wide range.It is denoted by "sigma (σ ). While calculating, it is the square root of variance. Variance is the figure in statistics that determines the average distance of a set of variables from an average value in a dataset. 



In the figure above, the green dot is the mean. In first case there is low standard deviation because the data are close  to the mean but in second case as the data are scattered away from the mean will have more standard deviation. So, we can say that the standard deviation is the measurement of variability.

Let us consider an example. The mean yield (kg/ha) of a crop obtained are: 600,470,170,430 and 300. 

The mean yield is 394 kg/ha. Now lets calculate each yield difference from the mean. It will be 
Now variance will be;
(206)2+(76) 2+(-2242)+(36) 2+(-94) 2/5 =108520/5 =21704

Now taking its square root we obtain standard deviation as 147.32 kg/ha 

Formula for standard deviation:

A. When your data is the whole data then standard deviation is calculated by:
square root of [ (1/N) times Sigma i=1 to N of (xi - mu)^2 ]

B. When your data is sample data, then the standard deviation is calculated by:
square root of [ (1/(N-1)) times Sigma i=1 to N of (xi - xbar)^2 ]


For data with normal distribution, 95% of individuals will have values within 2 standard deviation of mean. The other 5 % will be scattered equally above and below of the limit. 


What is  standard error?

It is the statistical term that measures the accuracy with which a sample represents a population. The deviation of sample mean with actual mean is the standard error. The standard error is the approximate standard deviation of sample population. In fact, the sample mean is unlikely to be same as population mean. There will be difference in estimates in different samples taken. And it is due to sampling variation.

Suppose we replicated a study five times with five observation each time. It will result five means.


There will be five standard deviation around the means; one for each set of measurement. If we plot all five means on the same number line we will observe it as shown in the figure below.


                This is mean of the means                   This is S.D. of mean of the means (on both side)

The standard deviation of means is called the standard error

The standard error tend to  be less with the increase in sample size. The standard deviation of mean can be termed as the standard deviation of the sample means taken from a population. The smaller standard error notifies that the sample is more representative of a population. 

SE = \sigma / sqrt(n)

Decision criteria: 

When we take sample from a population, the mean varies each time for different samples. For which, we can estimate how much sample means will vary from the standard deviation of this sampling distribution. It is called as standard error. So, in reality standard error is a type of standard deviation. Standard error is also the measure of precision of a sample mean. It changes with the change in sample size but such change is not noticed in standard deviation. Standard deviation quantifies the variation within a set of measurement whereas the standard error quantifies the variation in the means from multiple sets of measurement.

In case, we want to see the scatterness of measurement, we can use standard deviation. Moreover, if we want to indicate the uncertainty around the estimate of mean, we quote standard error of mean. 

In many articles,  ± sign is used to notify both for standard deviation and standard error, but there should be clearly indicated whether the author has used S.D. or SEM.

Let's discuss with an example as shown in the table below:

Variety A yield (kg/ha)  Variety B yield (kg/ha)
92 98
88 99
100 100
106 101
114 102
Average  100 100

In both the case the average yield is 100 kg/ha . But now lets see the standard deviation.

For variety A:10.48, The standard error is 4.68


For variety B: 1.58, The standard error is 0.706.

You can calculate standard deviation from the link below also:
https://www.mathsisfun.com/data/standard-deviation-calculator.html 


It shows that the variety A has more variability and scatterness. But still we don't know our data is precise or not. In other words, we are not sure about the sample mean is truly representing the population mean. For this, we should know the standard error of mean. 

Suppose for variety B, we conduct multiple trails and obtain various means e.g. 100,104,96,97,103,102,101,98,107,95,105,95,96,104,106. Now, if we calculate the standard deviation of the average value, it is called as the standard error of mean. 


If we calculate the standard deviation of the above means we find it to be 4.19. Now by the formula of standard error it can be calculated as 4.68. Which is similar. 

In actual scenario, we don't have enough time and resources to draw multiple samples from a population to draw a standard error. So, for this we can use the formula for calculation of SEM. If we have large standard deviation with small sample size, it is more likely to incorrectly estimate the mean. In such case, standard deviation will be very large. Standard error helps to detect sampling errors and measurement errors. 

1 Comments:

At July 20, 2020 at 1:03 PM , Blogger ललित said...

Thank you for the information sir :)

 

Post a Comment

Subscribe to Post Comments [Atom]

<< Home