본문 바로가기
R

(R) Two sample t-test (Student's t, Welch's t )

by jangpiano 2020. 10. 27.
반응형

<Two Sample T-test>

Two sample t-test is to test whether the two population means are equal or not. 

So the null hypothesis here is, H0: the means of two populations are equal (M(A) =M(B)). 

The alternative hypothesis here is, H1: the means of two populations are different (M(A) !=M(B)).

The common assumption for the two tests is "the sampling distributions are normally distributed."

That is, both groups have the normal distribution for their sample mean. And as we have the great theorem which is "Central limit theorem," we can use both tests even when the groups have non-normal distributions when the sample sizes are large enough. 

There are two types of t-tests, Student's t-test and Welch's t-test. 

The difference between the tests is assumption about variance of two samples. For Student's t-test, we assume the variances of two populations are equal. In contrast, we assume the variances of two populations are not equal for Welch's t-test. 

So if we have two populations, which have same variances, we prefer to use Student's t-test, and for the different variances, we are willing to use Welch's t-test.


<R> 


> library(MASS)

> head(painters)


> View(painters)



> set.seed(3)


> B=which(painters$School=="B")

> B

[1] 11 12 13 14 15 16

> C=which(painters$School=="C")

> C

[1] 17 18 19 20 21 22

> Group_B=painters[B,]

> Group_B

> Group_C=painters[C,]

> Group_C



> Com=c(Group_B$Composition,Group_C$Composition)

> Com

 [1] 10 13 10 15 13 12 14 16 10 13 11 15

> School=c(Group_B$School,Group_C$School)

> School

 [1] 2 2 2 2 2 2 3 3 3 3 3 3


> d_1=as.data.frame(cbind(Com,School))

> d_1

   Com School

1   10      2

2   13      2

3   10      2

4   15      2

5   13      2

6   12      2

7   14      3

8   16      3

9   10      3

10  13      3

11  11      3

12  15      3


<Student's T-TEST>


> t.test(Com~School,data=d_1,var.equal=TRUE)


Two Sample t-test


data:  Com by School

t = -0.81051, df = 10, p-value = 0.4365

alternative hypothesis: true difference in means is not equal to 0

95 percent confidence interval:

 -3.749041  1.749041

sample estimates:

mean in group 2 mean in group 3 

       12.16667        13.16667 


<Welch's T-TEST>


> t.test(Com~School,data=d_1,var.equal=FALSE)


Welch Two Sample t-test


data:  Com by School

t = -0.81051, df = 9.7022, p-value = 0.4371

alternative hypothesis: true difference in means is not equal to 0

95 percent confidence interval:

 -3.76052  1.76052

sample estimates:

mean in group 2 mean in group 3 

       12.16667        13.16667 



<Interpretation>


<Student's T-TEST>

> t.test(Com~School,data=d_1,var.equal=TRUE)  ->This is for student's t-test which need settings "equal variances."


Two Sample t-test


data:  Com by School

t = -0.81051, df = 10, p-value = 0.4365 ->p-value does not seem low enough, so we do not have the reason for reject the null hypothesis.

alternative hypothesis: true difference in means is not equal to 0

95 percent confidence interval ->the interval which makes 95% that the confidence interval we calculated contains the true population mean. That is, [-3.749041, 1.749041] is the 95 percent confidence Interval. As the sample size increases, the range of interval values gets shorter, which means higher accuracy compared to small sample. 

 -3.749041  1.749041

sample estimates:

mean in group 2 mean in group 3 

       12.16667        13.16667 




<Welch's T-TEST>


> t.test(Com~School,data=d_1,var.equal=FALSE) ->This is for Welch's t-test which need settings "unequal variances."


Welch Two Sample t-test


data:  Com by School

t = -0.81051, df = 9.7022, p-value = 0.4371 ->p-value does not seem low enough, so we do not have the reason for reject the null hypothesis. 

alternative hypothesis: true difference in means is not equal to 0

95 percent confidence interval: ->the interval which makes 95% that the confidence interval we calculated contains the true population mean. That is, [-3.76052, 1.76052] is the 95 percent confidence Interval. As the sample size increases, the range of interval values gets shorter, which means higher accuracy compared to small sample. 

 -3.76052  1.76052

sample estimates:

mean in group 2 mean in group 3 

       12.16667        13.16667 




반응형