<(R) 단순 선형 회귀분석 - 분산분석표 >
<단순 선형 회귀>
<임의의 데이터 생성>
> X=sort(sample(x=0:10, size=15, replace=TRUE))
> Y=sort(sample(x=0:10, size=15, replace=TRUE))
> data_1=data.frame(X,Y)
> data_1
X Y
1 0 0
2 0 1
3 0 1
4 2 3
5 3 3
6 3 4
7 3 4
8 4 4
9 4 5
10 6 5
11 7 6
12 7 7
13 7 7
14 8 9
15 10 10
> plot(X, Y , xlim = c(0, 10), ylim = c(0, 10))
<최적의 추정량 찾기 >
> res=lm(Y~X, data_1 ) #lm(반응변수 ~ 독립변수, 데이터)
> res
Call:
lm(formula = Y ~ X, data = data_1)
Coefficients: #차례로 절편 모수의 최소제곱 추정값, 기울기 모수의 최소제곱 추정값
(Intercept) X
0.7799 0.8953
> abline(res) #X,Y 에 대한 산점도에 최적의 선을 그려줌
> summary(res)
Call:
lm(formula = Y ~ X, data = data_1)
Residuals:
Min 1Q Median 3Q Max
-1.1519 -0.4136 0.2201 0.4817 1.0575
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.77994 0.29927 2.606 0.0218 *
X 0.89533 0.05724 15.641 0.000000000823 ***
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.6698 on 13 degrees of freedom
Multiple R-squared: 0.9495, Adjusted R-squared: 0.9457
F-statistic: 244.6 on 1 and 13 DF, p-value: 0.000000000823
<B0과 B1의 95% 신뢰구간 구하기>
#B0과 B1에 대한 conf_int 구하기 위한 R코드
conf_int= function(X, Y, a, B){
sample_mean = mean(X)
Sxx = (sum((X - sample_mean)^2))
t_critical = abs(qt(a/2, length(X)-2))
estimate_var = sqrt(deviance(lm(Y~X))/(length(X)-2))
if (B=="B0"){
estimate_B0 = coef(lm(Y~X))[1]
lower_bound = estimate_B0 - t_critical * estimate_var * sqrt(1/length(X) + sample_mean^2/Sxx)
upper_bound = estimate_B0 + t_critical * estimate_var * sqrt(1/length(X) + sample_mean^2/Sxx)
return(c(lower_bound, upper_bound))
}
else if (B=="B1"){
estimate_B1 = coef(lm(Y~X))[2]
lower_bound = estimate_B1 - t_critical * estimate_var * sqrt(1/Sxx)
upper_bound = estimate_B1 + t_critical * estimate_var * sqrt(1/Sxx)
return(c(lower_bound, upper_bound))
}
}
<B0=0, B1=0 에대한 검정 >
B0=0 에 대한 T 검정 통계량(test statistics) : 2.606
B1=0 에 대한 T 검정 통계량(test statistics) : 15.641
t(0.025, 13) = 2.160369
두 검정 통계량 모두 t(0.025, 13) 보다 큰값이므로, 귀무가설(B0=0 , B1=0)을 기각한다.
또한, p-value 가 각각 0.0218, 0.000000000823으로, 0.05 보다 작으므로 기각함으로 해석할 수 도 있다.
> anova(res)
Analysis of Variance Table
Response: Y
Df Sum Sq Mean Sq F value Pr(>F)
X 1 109.767 109.767 244.64 0.000000000823 ***
Residuals 13 5.833 0.449
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
잔차제곱합 (SSE) = 5.883
분산의 추정값 (MSE) = 0.449
p-value = 0.000000000823 ---> H0(yi = B0+ εi) 기각. --->X와 Y 사이에 선형관계가 있음을 증명.
<분산분석표 요약>
> X=sort(sample(x=0:10, size=15, replace=TRUE))
> Y=sort(sample(x=0:10, size=15, replace=TRUE))
> lm(Y~X)
Call:
lm(formula = Y ~ X)
Coefficients:
(Intercept) X
-0.005072 1.016061
> summary(lm(Y~X))
Call:
lm(formula = Y ~ X)
Residuals:
Min 1Q Median 3Q Max
-2.09129 -0.14751 -0.02705 0.43280 1.90871
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.005072 0.437673 -0.012 0.991
X 1.016061 0.079294 12.814 0.00000000947 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.9959 on 13 degrees of freedom
Multiple R-squared: 0.9266, Adjusted R-squared: 0.921
F-statistic: 164.2 on 1 and 13 DF, p-value: 0.000000009475
> anova(lm(Y~X))
Analysis of Variance Table
Response: Y
Df Sum Sq Mean Sq F value Pr(>F)
X 1 162.841 162.841 164.2 0.000000009475 ***
Residuals 13 12.893 0.992
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
<원점을 통과하는 회귀식>
> #절편이 0일때(zero intercept) #원점을 통과하는 모형에 대한 회귀분석 결과 및 분산표
> res = lm(Y~X + 0 , data_1 )
> res
Call:
lm(formula = Y ~ X + 0, data = data_1)
Coefficients:
X
1.226
> abline(res, col="red")
> summary(res)
Call:
lm(formula = Y ~ X + 0, data = data_1)
Residuals:
Min 1Q Median 3Q Max
-2.2571 0.3229 0.8714 2.7743 3.7743
Coefficients:
Estimate Std. Error t value Pr(>|t|)
X 1.2257 0.1222 10.03 0.0000000899 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.286 on 14 degrees of freedom
Multiple R-squared: 0.8778, Adjusted R-squared: 0.8691
F-statistic: 100.6 on 1 and 14 DF, p-value: 0.00000008995
> anova(res)
Analysis of Variance Table
Response: Y
Df Sum Sq Mean Sq F value Pr(>F)
X 1 525.83 525.83 100.61 0.00000008995 *** # H0 : B0 = 0 기각
Residuals 14 73.17 5.23 #순서대로 SSE, MSE
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
잔차제곱합 (SSE) = 73.17
분산의 추정값 (MSE) = 5.23
p-value = 0.00000008995 ---> H0: yi = B1xi +εi (B0 = 0) 기각