<Normal Approximation to Binomial>
There is an important property of binomial distribution which is the sum of n independent and identically distributed Bernoulli distribution. The property is called 'Normal approximation to Binomial.'
When X~bin(n,p) we can approximate the binomial distribution with Normal distribution which has the mean np and the variance np(1-p) If n is large enough and p is close to 1/2. That is the condition is np>5, n(1-p)>5.
Let me show this property with R.
<rbinom(n,size,prob)>
Firstly, make random samples of binomial distribution with rbinom(n, size, prob) where n is the number of observations, the size is the number of trials and prob is the probability of getting success
#Making 200 random samples from a binomial distribution with 1 trial and 0.52 probability of getting a success.
> rbinom(200,1,0.52)
[1] 0 1 1 0 1 0 1 0 0 0 1 0 1 1 0 0 0 0 0 0 1 1 0 1 1
[26] 0 1 1 1 0 0 1 1 0 0 1 0 0 0 1 0 0 0 0 0 1 0 0 0 1
[51] 1 0 1 0 0 1 1 0 0 0 1 0 0 1 0 1 0 1 0 0 0 0 0 1 1
[76] 0 0 0 1 1 1 1 1 0 0 1 1 1 0 0 1 1 0 0 1 1 0 1 0 0
[101] 0 0 1 1 1 1 0 1 1 0 0 1 0 1 1 0 1 0 0 1 0 1 0 0 1
[126] 0 1 1 0 1 1 0 1 0 0 1 1 1 1 1 1 0 1 0 1 1 0 1 0 1
[151] 0 0 1 1 1 0 1 1 1 1 1 0 0 1 0 0 1 1 1 1 1 0 1 0 1
[176] 0 1 1 0 1 1 1 0 0 1 0 1 0 0 0 1 0 1 0 1 0 1 1 1 1
#Making 100 random samples from a binomial distribution with 3 trials and 0.52 probability of getting a success.
Because the probability of getting a success is limited as 0.2, there are only few 3 in rbinom(100,3,0.2) which is the case when 3 successes among 3 trials.
> rbinom(100,3,0.2)
[1] 1 0 1 0 0 0 0 0 0 0 0 0 1 1 1 2 1 0 0 1 0 0 1 0 1
[26] 2 1 2 1 0 1 1 0 2 0 0 1 0 1 2 1 0 1 0 0 0 1 0 1 2
[51] 1 2 1 1 0 0 1 1 1 0 2 0 0 0 1 0 1 1 1 0 0 1 0 1 0
[76] 1 1 2 0 0 1 1 1 1 1 0 1 1 0 2 1 1 1 0 2 2 0 1 0 0
#Making 100 random samples from a binomial distribution with 3 trials and 0.7 probability of getting a success.
For this case, the number of getting a success in each trial is 0.7 and that is the reason among 3 trials, we get more 3 or 2 successes than 0 or 1 successes.
> rbinom(100,3,0.7)
[1] 3 2 2 3 3 2 3 2 2 2 2 3 2 2 3 3 2 2 2 2 3 3 1 2 3
[26] 1 1 1 1 3 3 3 2 2 2 2 1 2 2 1 3 3 3 2 2 3 2 3 3 3
[51] 3 3 3 3 2 2 2 1 3 2 2 2 2 1 3 2 2 2 1 2 1 3 2 3 2
[76] 2 2 2 2 3 2 3 3 2 2 1 2 2 2 3 3 2 2 1 2 3 3 1 2 3
As you can recognize from the shape of the graph above, the rbinom(1000,100,0.5) approximately follows normal distribution because its shape looks approximately symmetric.
> hist(rbinom(1000,100,0.5))
To compare with the histogram of binomial distribution of 1000 random samples from bin(100,0.5), I make 1000 random samples from N(100*0.5 ,100*0.5*0.5) using rnorm(), where 100*0.5 is the mean of the binomial distribution and 100*0.5*0.5 is the variance of the binomial distribution.
where n is the number of observations and two parameters with mean and standard deviation.
> hist(rnorm(1000,50,sqrt(100*0.5*0.5)))
Bin(100,0.5) |
N(100*0.5 ,100*0.5*0.5) |
|
|
<Compute Binomial distribution with Normal Approximation>
You should take account of continuity correction when approximate binomial distribution with a normal distribution.
Continuity Correction: adjustment needed when the discrete distribution is approximated by a continuous distribution.
For example, when we approximate Binomial distribution which is discrete with Normal distribution which is continuous, continuity correction is needed.
# Probability of getting from 30 to 50 successes among 100 trials with 0.5 probability of success.
> sum(dbinom(30:50,100,0.5))
[1] 0.5397785
#probability of getting x below 50.5 - probability of getting x below 29.5 -------->area between 29.5 and 50.5
> pnorm(50.5,50,sqrt(100*0.5*0.5))-pnorm(29.5,50,sqrt(100*0.5*0.5))
[1] 0.5398072
#probability of getting from 50 to 100 successes among 100 trials with 0.5 probability of success.
> sum(dbinom(50:100,100,0.5))
[1] 0.5397946
#probability of getting x above 49.5 --------------------------> area between 49.5 and 100
> 1-pnorm(49.5,50,sqrt(100*0.5*0.5))
[1] 0.5398278
# probability of getting from 50 successes among 100 trials with 0.5 probability of success.
> dbinom(50,100,0.5)
[1] 0.07958924
# probability of getting x below 50.5 - probability of getting x below 49.5 ------>area between 49.5 and 50.5
> pnorm(50.5,50,sqrt(100*0.5*0.5))-pnorm(49.5,50,sqrt(100*0.5*0.5))
[1] 0.07965567
'R' 카테고리의 다른 글
(R)How to add several line graphs to a graph / rearrange the order of colour label/ How to add colours to each graph (0) | 2020.10.09 |
---|---|
(R) Central limit theorem (0) | 2020.09.27 |
(R) geom_violin() - a density graph (0) | 2020.09.09 |
stat_summary (0) | 2020.09.09 |
(R) The way to interpret boxplot in R (0) | 2020.09.07 |