<split>
split function divides the data in the vector x into the groups defined by function.
There are two main arguments in split function.
x: a vector or a data frame
f: a factor or a list of factors
>?split
<EXAMPLE 1>
> list_1=list(x=rnorm(5,0,1),y=runif(3,5,10))
> list_1
$x
[1] -0.1205420 0.6790684 -1.0631887 0.7876966 0.3130599
$y
[1] 8.521317 5.693054 5.061110
> sapply(list_1,mean)
x y
0.1192188 6.4251602
<EXAMPLE 2>
> str(airquality)
'data.frame': 153 obs. of 7 variables:
$ Ozone : int 41 36 12 18 NA 28 23 19 8 NA ...
$ Solar.R: int 190 118 149 313 NA NA 299 99 19 194 ...
$ Wind : num 7.4 8 12.6 11.5 14.3 14.9 8.6 13.8 20.1 8.6 ...
$ Temp : int 67 72 74 62 56 66 65 59 61 69 ...
$ Month : int 5 5 5 5 5 5 5 5 5 5 ...
$ Day : int 1 2 3 4 5 6 7 8 9 10 ...
$ SOLAR : chr "med" "med" "med" "strong" ...
> split(airquality$Temp,airquality$Month)
$`5`
[1] 67 72 74 62 56 66 65 59 61 69 74 69 66 68 58 64 66 57 68 62 59 73 61 61 57
[26] 58 57 67 81 79 76
$`6`
[1] 78 74 67 84 85 79 82 87 90 87 93 92 82 80 79 77 72 65 73 76 77 76 76 76 75
[26] 78 73 80 77 83
$`7`
[1] 84 85 81 84 83 83 88 92 92 89 82 73 81 91 80 81 82 84 87 85 74 81 82 86 85
[26] 82 86 88 86 83 81
$`8`
[1] 81 81 82 86 85 87 89 90 90 92 86 86 82 80 79 77 79 76 78 78 77 72 75 79 81
[26] 86 88 97 94 96 94
$`9`
[1] 91 92 93 93 87 84 80 78 75 73 81 76 77 71 71 78 67 76 68 82 64 71 81 69 63
[26] 70 77 75 76 68
<sapply vs tapply>
sapply |
tapply |
two main arguments |
three main arguments |
> sapply(airquality,mean) Ozone Solar.R Wind Temp Month Day NA NA 9.957516 77.882353 6.993464 15.803922 |
> tapply(airquality$Temp,airquality$Month,mean) 5 6 7 8 9 65.54839 79.10000 83.90323 83.96774 76.90000 |
With sapply function, we canapplies a function to each group of values given by a unique combination of the levels of certain factors, which is what tapply function does.
[one factor, one function]
split +sapply |
tapply |
> airquality_2=split(airquality$Temp,airquality$Month) > airquality_2 $`5` [1] 67 72 74 62 56 66 65 59 61 69 74 69 66 68 58 64 66 57 68 62 59 73 61 61 [25] 57 58 57 67 81 79 76 $`6` [1] 78 74 67 84 85 79 82 87 90 87 93 92 82 80 79 77 72 65 73 76 77 76 76 76 [25] 75 78 73 80 77 83 $`7` [1] 84 85 81 84 83 83 88 92 92 89 82 73 81 91 80 81 82 84 87 85 74 81 82 86 [25] 85 82 86 88 86 83 81 $`8` [1] 81 81 82 86 85 87 89 90 90 92 86 86 82 80 79 77 79 76 78 78 77 72 75 79 [25] 81 86 88 97 94 96 94 $`9` [1] 91 92 93 93 87 84 80 78 75 73 81 76 77 71 71 78 67 76 68 82 64 71 81 69 [25] 63 70 77 75 76 68 > Temp_month_mean_2=sapply(airquality_2,mean) > Temp_month_mean_2 5 6 7 8 9 65.54839 79.10000 83.90323 83.96774 76.90000 |
> tapply(airquality$Temp,airquality$Month,mean) 5 6 7 8 9 65.54839 79.10000 83.90323 83.96774 76.90000 |
[one factor, more than one function]
split +sapply |
tapply |
> mean.sd=function(x)c(Mean=mean(x),SD=sd(x)) > airquality_2=split(airquality$Temp,airquality$Month) > sapply(airquality_2,mean.sd) 5 6 7 8 9 Mean 65.54839 79.100000 83.903226 83.967742 76.900000 SD 6.85487 6.598589 4.315513 6.585256 8.355671 > sapply(airquality_2,mean.sd,simplify=F) $`5` Mean SD 65.54839 6.85487 $`6` Mean SD 79.100000 6.598589 $`7` Mean SD 83.903226 4.315513 $`8` Mean SD 83.967742 6.585256 $`9` Mean SD 76.900000 8.355671 |
> tapply(airquality$Temp,airquality$Month,mean.sd) $`5` Mean SD 65.54839 6.85487 $`6` Mean SD 79.100000 6.598589 $`7` Mean SD 83.903226 4.315513 $`8` Mean SD 83.967742 6.585256 $`9` Mean SD 76.900000 8.355671 |
[More than a factor, one function]
> airquality$SOLAR<-ifelse(airquality$Solar.R<=115,"weak",ifelse(airquality$Solar.R<=205,"med","strong"))
> table(airquality$SOLAR)
med strong weak
36 73 37
sapply |
tapply |
> airquality_3=split(airquality$Temp,list(airquality$Month,airquality$SOLAR)) > airquality_3 $`5.med` [1] 67 72 74 69 $`6.med` [1] 84 82 82 77 73 76 77 75 78 83 $`7.med` [1] 83 89 82 81 87 $`8.med` [1] 86 86 80 78 88 97 94 $`9.med` [1] 91 92 93 93 82 81 70 77 75 76 $`5.strong` [1] 62 65 69 66 68 64 66 68 73 58 81 79 76 $`6.strong` [1] 78 74 67 85 79 87 90 87 93 92 80 79 72 76 $`7.strong` [1] 84 85 81 83 88 92 92 73 91 81 82 84 85 81 82 86 85 88 86 83 81 $`8.strong` [1] 89 90 90 92 82 78 77 75 79 81 86 94 96 $`9.strong` [1] 80 78 75 73 81 76 77 78 67 68 64 68 $`5.weak` [1] 59 61 58 57 62 59 61 61 57 67 $`6.weak` [1] 65 76 76 73 80 77 $`7.weak` [1] 84 80 74 82 86 $`8.weak` [1] 81 81 82 79 77 79 76 72 $`9.weak` [1] 87 84 71 71 76 71 69 63 |
> tapply(airquality$Temp,list(airquality$Month,airquality$SOLAR),mean.sd) med strong weak 5 Numeric,2 Numeric,2 Numeric,2 6 Numeric,2 Numeric,2 Numeric,2 7 Numeric,2 Numeric,2 Numeric,2 8 Numeric,2 Numeric,2 Numeric,2 9 Numeric,2 Numeric,2 Numeric,2 |
[More than a factor, More than a function]
sapply |
tapply |
|
> mean.sd=function(x)c(Mean=mean(x),SD=sd(x)) > tapply(airquality$Temp,list(airquality$Month,airquality$SOLAR),mean.sd) med strong weak 5 Numeric,2 Numeric,2 Numeric,2 6 Numeric,2 Numeric,2 Numeric,2 7 Numeric,2 Numeric,2 Numeric,2 8 Numeric,2 Numeric,2 Numeric,2 9 Numeric,2 Numeric,2 Numeric,2 |
'R' 카테고리의 다른 글
[회귀]단순 선형 회귀분석 R코드 정리 / Simple linear regression in R (2) | 2021.01.23 |
---|---|
(R) ANOVA in Simple Linear Regression / 단순 선형 회귀분석 - 분산분석표 (0) | 2021.01.11 |
(R) tapply function/ comparision to aggregate function /permutation test with tapply function (0) | 2020.11.17 |
(R) lapply, sapply, mapply/ two-sample t-test using mapply function (0) | 2020.11.17 |
(R) apply function on matrix and data frame (0) | 2020.11.15 |