A quick experiment in R can unveil the impact of sample size on the estimates we make from data. A small number of samples provides us less information about the process or system from which we’re collecting data, while a large number can help ground our findings in near certainty. See the earlier post on sample size, confidence intervals and related topics on R Explorations.
Using the “animation” package once again, I’ve put together a simple animation to describe this.
#package containing saveGIF function library(animation) #setting GIF options ani.options(interval = 0.12, ani.width = 480, ani.height = 320) #a function to help us call GIF plots easily plo <- function(samplesize, iter = 100){ for (i in seq(1,iter)){ #Generating a sample from the normal distribution x <- rnorm(samplesize,mu,sd) #Histogram of samples as they're generated hist(x, main = paste("N = ",samplesize,", xbar = ",round(mean(x), digits = 2), ", s = ",round(sd(x),digits = 2)), xlim = c(5,15), ylim = c(0,floor(samplesize/3)), breaks = seq(4,16,0.5), col = rgb(0.1,0.9,0.1,0.2), border = "grey", xlab = "x (Gaussian sample)") #Adding the estimate of the mean line to the histogram abline(v = mean(x), col = "red", lw = 2 ) } } #Setting the parameters for the distribution mu = 10.0 sd = 1.0 for (i in c(10,50,100,500,1000,10000)){ saveGIF({plo(i,mu,sd)},movie.name = paste("N=",i,", mu=",mu,", sd=",sd,".gif")) }