(R) Two sample Bootstrap Method

<Two sample Bootstrap Method>

I explained why Bootstrap method is important and how to use it in previous post. The difference between the previous one and this post will be the number of samples we are interested in. We want to assume the relationship ,for example the difference between more than one populations, by using bootstrap method. Let's assume that we want to show that weights of male and women are 0, that is weights of men = weights of women. We set it as null hypothesis.

H0: M0=M1

H1: M0!=M1

Bootstrap Method is simply the way to make inference about population using smaller samples with replacement. The most important point here is 'with replacement.'

Firstly, we assume that there are two Independent groups A and B. Because the number of people is very large, we collect random variables from two groups. In the example below, the sample sizes for both group is 30. And as sample cannot be changed, we must set 'set.seed(3)' firstly.

Secondly, we devise the way to distinguish between male and female information

Next, make a numeric vector, length of 10000, and fill the vector in mean difference of 20 samples' sample with replacement between men and women weight.

> set.seed(3)

> gender=c(rep("male",20),rep("female",20))

> weight=c(runif(20,50,110),runif(20,50,90))

> d<-as.data.frame(cbind(gender,weight))

> d

gender weight

1 male 58.4020763169974

2 male 90.3758199536242

3 male 69.2471175687388

4 male 66.3867158582434

5 male 80.105033738073

6 male 80.2197027020156

7 male 56.2316722120158

8 male 64.7300462122075

9 male 78.8804959505796

10 male 81.5489637199789

11 male 75.6007948773913

12 male 75.2511957078241

13 male 76.7017676727846

14 male 77.862471784465

15 male 93.3959743822925

16 male 91.4854346658103

17 male 55.5724576697685

18 male 85.1844179444015

19 male 94.8744132183492

20 male 63.9866276877001

21 female 59.1280752513558

22 female 50.6131957005709

23 female 55.1592623442411

24 female 53.7352771405131

25 female 59.4754002988338

26 female 81.6458963789046

27 female 73.9892626274377

28 female 86.4059084374458

29 female 72.4169821571559

30 female 80.2281907666475

31 female 65.1668756455183

32 female 64.9312390759587

33 female 56.8116257153451

34 female 68.1322929449379

35 female 60.3365584183484

36 female 63.4506380930543

37 female 85.5833213869482

38 female 58.0778519529849

39 female 73.1674417108297

40 female 58.3052811957896

> d$weight<-as.numeric(d$weight)

> d

gender weight

1 male 58.40208

2 male 90.37582

3 male 69.24712

4 male 66.38672

5 male 80.10503

6 male 80.21970

7 male 56.23167

8 male 64.73005

9 male 78.88050

10 male 81.54896

11 male 75.60079

12 male 75.25120

13 male 76.70177

14 male 77.86247

15 male 93.39597

16 male 91.48543

17 male 55.57246

18 male 85.18442

19 male 94.87441

20 male 63.98663

21 female 59.12808

22 female 50.61320

23 female 55.15926

24 female 53.73528

25 female 59.47540

26 female 81.64590

27 female 73.98926

28 female 86.40591

29 female 72.41698

30 female 80.22819

31 female 65.16688

32 female 64.93124

33 female 56.81163

34 female 68.13229

35 female 60.33656

36 female 63.45064

37 female 85.58332

38 female 58.07785

39 female 73.16744

40 female 58.30528

> of_male=which(d$gender=="male")

> of_female=which(d$gender=="female")

> bs.means.difference=numeric(10000)

> for (i in 1:10000){

+ bs.means.difference[i]=mean(sample(d$weight[of_male],replace=TRUE))-mean(sample(d$weight[of_female],replace=TRUE))

+ }

T.obs is the value we can observe from the samples from the population, that is mean(samples from men)-mean(samples from women)

> T.obs=mean(d$weight[of_male])-mean(d$weight[of_female])

> T.obs

[1] 9.464131

> p.val = sum(abs(bs.means.difference-mean(bs.means.difference)) >= abs(T.obs))/10000

> p.val

[1] 0.0071

It is suggested to add 1 to both denominator and nominator while using > sign instead of >=

> p.val = (sum(abs(bs.means.difference-mean(bs.means.difference)) > abs(T.obs))+1)/(10000+1)

> p.val

[1] 0.00719928

<Histogram 1>

> hist(bs.means.difference,main="Histogram of Bootstrap Mean Differences",xlab="Mean Difference",ylab="Frequency")

> quantile(bs.means.difference,c(0.025,0.975))

2.5% 97.5%

2.455102 16.480545

> abline(v=quantile(bs.means.difference,c(0.025,0.975)),col="blue")

<Histogram 2 - to make 0 a center of the histogram - subtract mean(bs.means.difference from bs.means.difference)>

>hist(bs.means.difference-T.obs,main="Histogram of Bootstrap Mean Differences",xlab="Mean Difference",ylab="Frequency")

>abline(v=T.obs,col="red")

The area right side of the red line is about 0.00719 which is p-value .

저작자표시 (새창열림)

'R' 카테고리의 다른 글

(R) Two sample t-test (Student's t, Welch's t ) (0)	2020.10.27
(R) Permutation Test - Non parametric method (0)	2020.10.24
(R) One-sample Bootstrap Method (0)	2020.10.17
(R)How to add several line graphs to a graph / rearrange the order of colour label/ How to add colours to each graph (0)	2020.10.09
(R) Central limit theorem (0)	2020.09.27

Jangpiano Science

(R) Two sample Bootstrap Method

'R' 카테고리의 다른 글

티스토리툴바

(R) Two sample Bootstrap Method

'R' 카테고리의 다른 글

관련글

티스토리툴바