(R) Normal approximation to Binomial

<Normal Approximation to Binomial>

There is an important property of binomial distribution which is the sum of n independent and identically distributed Bernoulli distribution. The property is called 'Normal approximation to Binomial.'

When X~bin(n,p) we can approximate the binomial distribution with Normal distribution which has the mean np and the variance np(1-p) If n is large enough and p is close to 1/2. That is the condition is np>5, n(1-p)>5.

Let me show this property with R.

<rbinom(n,size,prob)>

Firstly, make random samples of binomial distribution with rbinom(n, size, prob) where n is the number of observations, the size is the number of trials and prob is the probability of getting success

#Making 200 random samples from a binomial distribution with 1 trial and 0.52 probability of getting a success.

> rbinom(200,1,0.52)

[1] 0 1 1 0 1 0 1 0 0 0 1 0 1 1 0 0 0 0 0 0 1 1 0 1 1

[26] 0 1 1 1 0 0 1 1 0 0 1 0 0 0 1 0 0 0 0 0 1 0 0 0 1

[51] 1 0 1 0 0 1 1 0 0 0 1 0 0 1 0 1 0 1 0 0 0 0 0 1 1

[76] 0 0 0 1 1 1 1 1 0 0 1 1 1 0 0 1 1 0 0 1 1 0 1 0 0

[101] 0 0 1 1 1 1 0 1 1 0 0 1 0 1 1 0 1 0 0 1 0 1 0 0 1

[126] 0 1 1 0 1 1 0 1 0 0 1 1 1 1 1 1 0 1 0 1 1 0 1 0 1

[151] 0 0 1 1 1 0 1 1 1 1 1 0 0 1 0 0 1 1 1 1 1 0 1 0 1

[176] 0 1 1 0 1 1 1 0 0 1 0 1 0 0 0 1 0 1 0 1 0 1 1 1 1

#Making 100 random samples from a binomial distribution with 3 trials and 0.52 probability of getting a success.

Because the probability of getting a success is limited as 0.2, there are only few 3 in rbinom(100,3,0.2) which is the case when 3 successes among 3 trials.

> rbinom(100,3,0.2)

[1] 1 0 1 0 0 0 0 0 0 0 0 0 1 1 1 2 1 0 0 1 0 0 1 0 1

[26] 2 1 2 1 0 1 1 0 2 0 0 1 0 1 2 1 0 1 0 0 0 1 0 1 2

[51] 1 2 1 1 0 0 1 1 1 0 2 0 0 0 1 0 1 1 1 0 0 1 0 1 0

[76] 1 1 2 0 0 1 1 1 1 1 0 1 1 0 2 1 1 1 0 2 2 0 1 0 0

#Making 100 random samples from a binomial distribution with 3 trials and 0.7 probability of getting a success.

For this case, the number of getting a success in each trial is 0.7 and that is the reason among 3 trials, we get more 3 or 2 successes than 0 or 1 successes.

> rbinom(100,3,0.7)

[1] 3 2 2 3 3 2 3 2 2 2 2 3 2 2 3 3 2 2 2 2 3 3 1 2 3

[26] 1 1 1 1 3 3 3 2 2 2 2 1 2 2 1 3 3 3 2 2 3 2 3 3 3

[51] 3 3 3 3 2 2 2 1 3 2 2 2 2 1 3 2 2 2 1 2 1 3 2 3 2

[76] 2 2 2 2 3 2 3 3 2 2 1 2 2 2 3 3 2 2 1 2 3 3 1 2 3

> table(rbinom(100,3,0.7))

0 1 2 3

4 22 45 29

<Graphs to show the property with R>

As you can recognize from the shape of the graph above, the rbinom(1000,100,0.5) approximately follows normal distribution because its shape looks approximately symmetric.

> hist(rbinom(1000,100,0.5))

To compare with the histogram of binomial distribution of 1000 random samples from bin(100,0.5), I make 1000 random samples from N(100*0.5 ,100*0.5*0.5) using rnorm(), where 100*0.5 is the mean of the binomial distribution and 100*0.5*0.5 is the variance of the binomial distribution.

where n is the number of observations and two parameters with mean and standard deviation.

> hist(rnorm(1000,50,sqrt(100*0.5*0.5)))

Bin(100,0.5)	N(1000.5 ,1000.5*0.5)

<Compute Binomial distribution with Normal Approximation>

You should take account of continuity correction when approximate binomial distribution with a normal distribution.

Continuity Correction: adjustment needed when the discrete distribution is approximated by a continuous distribution.

For example, when we approximate Binomial distribution which is discrete with Normal distribution which is continuous, continuity correction is needed.

# Probability of getting from 30 to 50 successes among 100 trials with 0.5 probability of success.

> sum(dbinom(30:50,100,0.5))

[1] 0.5397785

#probability of getting x below 50.5 - probability of getting x below 29.5 -------->area between 29.5 and 50.5

> pnorm(50.5,50,sqrt(100*0.5*0.5))-pnorm(29.5,50,sqrt(100*0.5*0.5))

[1] 0.5398072

#probability of getting from 50 to 100 successes among 100 trials with 0.5 probability of success.

> sum(dbinom(50:100,100,0.5))

[1] 0.5397946

#probability of getting x above 49.5 --------------------------> area between 49.5 and 100

> 1-pnorm(49.5,50,sqrt(100*0.5*0.5))

[1] 0.5398278

# probability of getting from 50 successes among 100 trials with 0.5 probability of success.

> dbinom(50,100,0.5)

[1] 0.07958924

# probability of getting x below 50.5 - probability of getting x below 49.5 ------>area between 49.5 and 50.5

> pnorm(50.5,50,sqrt(100*0.5*0.5))-pnorm(49.5,50,sqrt(100*0.5*0.5))

[1] 0.07965567

저작자표시 (새창열림)

'R' 카테고리의 다른 글

(R)How to add several line graphs to a graph / rearrange the order of colour label/ How to add colours to each graph (0)	2020.10.09
(R) Central limit theorem (0)	2020.09.27
(R) geom_violin() - a density graph (0)	2020.09.09
stat_summary (0)	2020.09.09
(R) The way to interpret boxplot in R (0)	2020.09.07

Jangpiano Science

(R) Normal approximation to Binomial

'R' 카테고리의 다른 글

티스토리툴바

(R) Normal approximation to Binomial

'R' 카테고리의 다른 글

관련글

티스토리툴바