본문 바로가기
R

(R) Normal approximation to Binomial

by jangpiano 2020. 9. 27.
반응형

<Normal Approximation to Binomial> 


There is an important property of binomial distribution which is the sum of n independent and identically distributed Bernoulli distribution. The property is called 'Normal approximation to Binomial.'


When X~bin(n,p) we can approximate the binomial distribution with Normal distribution which has the mean np and the variance np(1-p) If n is large enough and p is close to 1/2. That is the condition is np>5, n(1-p)>5. 


Let me show this property with R. 


<rbinom(n,size,prob)>


Firstly, make random samples of binomial distribution with rbinom(n, size, prob) where n is the number of observations, the size is the number of trials and prob is the probability of getting success



 #Making 200 random samples from a binomial distribution with 1 trial and 0.52 probability of getting a success.

> rbinom(200,1,0.52)                                                 

  [1] 0 1 1 0 1 0 1 0 0 0 1 0 1 1 0 0 0 0 0 0 1 1 0 1 1

 [26] 0 1 1 1 0 0 1 1 0 0 1 0 0 0 1 0 0 0 0 0 1 0 0 0 1

 [51] 1 0 1 0 0 1 1 0 0 0 1 0 0 1 0 1 0 1 0 0 0 0 0 1 1

 [76] 0 0 0 1 1 1 1 1 0 0 1 1 1 0 0 1 1 0 0 1 1 0 1 0 0

[101] 0 0 1 1 1 1 0 1 1 0 0 1 0 1 1 0 1 0 0 1 0 1 0 0 1

[126] 0 1 1 0 1 1 0 1 0 0 1 1 1 1 1 1 0 1 0 1 1 0 1 0 1

[151] 0 0 1 1 1 0 1 1 1 1 1 0 0 1 0 0 1 1 1 1 1 0 1 0 1

[176] 0 1 1 0 1 1 1 0 0 1 0 1 0 0 0 1 0 1 0 1 0 1 1 1 1


#Making 100 random samples from a binomial distribution with 3 trials and 0.52 probability of getting a success.

Because the probability of getting a success is limited as 0.2, there are only few 3 in rbinom(100,3,0.2) which is the case when 3 successes among 3 trials.


> rbinom(100,3,0.2)

  [1] 1 0 1 0 0 0 0 0 0 0 0 0 1 1 1 2 1 0 0 1 0 0 1 0 1

 [26] 2 1 2 1 0 1 1 0 2 0 0 1 0 1 2 1 0 1 0 0 0 1 0 1 2

 [51] 1 2 1 1 0 0 1 1 1 0 2 0 0 0 1 0 1 1 1 0 0 1 0 1 0

 [76] 1 1 2 0 0 1 1 1 1 1 0 1 1 0 2 1 1 1 0 2 2 0 1 0 0


#Making 100 random samples from a binomial distribution with 3 trials and 0.7 probability of getting a success.

For this case, the number of getting a success in each trial is 0.7 and that is the reason among 3 trials, we get more 3 or 2 successes than 0 or 1 successes.

> rbinom(100,3,0.7)

  [1] 3 2 2 3 3 2 3 2 2 2 2 3 2 2 3 3 2 2 2 2 3 3 1 2 3

 [26] 1 1 1 1 3 3 3 2 2 2 2 1 2 2 1 3 3 3 2 2 3 2 3 3 3

 [51] 3 3 3 3 2 2 2 1 3 2 2 2 2 1 3 2 2 2 1 2 1 3 2 3 2

 [76] 2 2 2 2 3 2 3 3 2 2 1 2 2 2 3 3 2 2 1 2 3 3 1 2 3

 

> table(rbinom(100,3,0.7))

 0  1  2  3 
 4 22 45 29 

<Graphs to show the property with R>


As you can recognize from the shape of the graph above, the rbinom(1000,100,0.5) approximately follows normal distribution because its shape looks approximately symmetric.


> hist(rbinom(1000,100,0.5))

 


To compare with the histogram of binomial distribution of 1000 random samples from bin(100,0.5), I make 1000 random samples from N(100*0.5 ,100*0.5*0.5) using rnorm(), where 100*0.5 is the mean of the binomial distribution and 100*0.5*0.5 is the variance of the binomial distribution. 


where n is the number of observations and two parameters with mean and standard deviation. 


> hist(rnorm(1000,50,sqrt(100*0.5*0.5)))


 Bin(100,0.5)

 N(100*0.5 ,100*0.5*0.5)

 

 


<Compute Binomial distribution with Normal Approximation>


You should take account of continuity correction when approximate binomial distribution with a normal distribution.

Continuity Correction: adjustment needed when the discrete distribution is approximated by a continuous distribution. 

For example, when we approximate Binomial distribution which is discrete with Normal distribution which is continuous, continuity correction is needed. 


 

# Probability of getting from 30 to 50 successes among 100 trials with 0.5 probability of success.  

> sum(dbinom(30:50,100,0.5))

[1] 0.5397785


#probability of getting x below 50.5 - probability of getting x below 29.5 -------->area between 29.5 and 50.5

> pnorm(50.5,50,sqrt(100*0.5*0.5))-pnorm(29.5,50,sqrt(100*0.5*0.5))

[1] 0.5398072



#probability of getting from 50 to 100 successes among 100 trials with 0.5 probability of success.  

> sum(dbinom(50:100,100,0.5))

[1] 0.5397946


#probability of getting x above 49.5  --------------------------> area between 49.5 and 100

> 1-pnorm(49.5,50,sqrt(100*0.5*0.5))

[1] 0.5398278


# probability of getting from 50 successes among 100 trials with 0.5 probability of success.  

> dbinom(50,100,0.5)

[1] 0.07958924


# probability of getting x below 50.5 - probability of getting x below 49.5 ------>area between 49.5 and 50.5

> pnorm(50.5,50,sqrt(100*0.5*0.5))-pnorm(49.5,50,sqrt(100*0.5*0.5))

[1] 0.07965567 

반응형