본문 바로가기
R

(R) Combining Data frame, matrix and vectors using rbind, cbind / Merging data frames using merge()

by jangpiano 2020. 11. 7.
반응형

Combining Data frame 


<rbind : combine rows of more than two data frames >

If you have the two data frame which has the same length of columns and same column names (x,y), you can combine the data frames in a data frame with the columns in common. If the name of variables in columns are different, it prints out the error. 


> dat2 = data.frame(x = 3:5, y = rep(2, 3))

> dat1=data.frame(x=1:6, y=rep(1,6))

> dat1

  x y

1 1 1

2 2 1

3 3 1

4 4 1

5 5 1

6 6 1

> dat2 = data.frame(x = 3:5, y = rep(2, 3))

> dat2

  x y

1 3 2

2 4 2

3 5 2

> dat3 = rbind(dat1, dat2)

> dat3

  x y

1 1 1

2 2 1

3 3 1

4 4 1

5 5 1

6 6 1

7 3 2

8 4 2

9 5 2


<cbind : combine columns of more than two data frames >


If the two data frames have the same length of rows, which is 9 in the example, you can combine more than two data frames in terms of rows, so that the combined data frame consists of 4 rows in 9 columns. 


> dat3

  x y

1 1 1

2 2 1

3 3 1

4 4 1

5 5 1

6 6 1

7 3 2

8 4 2

9 5 2

> dat5 = cbind(dat3, dat4)

> dat5

  x y z  h

1 1 1 3  7

2 2 1 3  8

3 3 1 3  9

4 4 1 3 10

5 5 1 3 11

6 6 1 3 12

7 3 2 3 13

8 4 2 3 14

9 5 2 3 15


Merging data frames 


I want emphasize on ' common columns or row names' in the definition of merge function. The core difference between merge and rbind, cbind function is that merge function only combine the data frame which has the same column and row names. That is, the resulting data frame only contains subjects that are matched by id.Example below shows that merge function only deals with the 'id' which resides in both data frame and also only deals with the same element in id which is 5,8 except for 1 and 6. 


 dat1 = data.frame(id = c(5,6,8), x = c(1,5,7))

 dat2 = data.frame(id = c(1, 5, 8), y = c("a","b","c"))

 merge(dat1,dat2)

> dat1

  id x

1  5 1

2  6 5

3  8 7

> dat2

  id y

1  1 a

2  5 b

3  8 c

> merge(dat1,dat2)

  id x y

1  5 1 b

2  8 7 c 


<Difference in merge, rbind, cbind>

merge(dat1,dat2)

 rbind(dat1, dat2)

 cbind(dat1, dat2)

 > merge(dat1,dat2)

  id x y

1  5 1 b

2  8 7 c 

Error in match.names(clabs, names(xi)) : 

  names do not match previous names 

 > cbind(dat1, dat2)

  id x id y

1  5 1  1 a

2  6 5  5 b

3  8 7  8 c


<Adjusting merge function>


merge(dat1,dat2)

 merge(dat1,dat2,all.x=TRUE)

 merge(dat1,dat2,all.y=TRUE)

 > merge(dat1,dat2)

  id x y

1  5 1 b

2  8 7 c 

   id x    y

1  5 1    b

2  6 5 <NA>

3  8 7    c

> merge(dat1,dat2,all.y=TRUE)

  id  x y

1  1 NA a

2  5  1 b

3  8  7 c 


merge(dat1,dat2)

 merge(dat1,dat2,all=TRUE) 

 merge(dat1,dat2,all=TRUE, by="id")

 > merge(dat1,dat2)

  id x y

1  5 1 b

2  8 7 c 

> merge(dat1,dat2,all=TRUE) 

  id  x    y

1  1 NA    a

2  5  1    b

3  6  5 <NA>

4  8  7    c 

> merge(dat1,dat2,all=TRUE, by="id")

  id  x    y

1  1 NA    a

2  5  1    b

3  6  5 <NA>

4  8  7    c 


The way to deal with two data frames which do not have common name of variables. 


> dat2

  id y

1  1 a

2  5 b

3  8 c

> dat3 = data.frame(id2 = c(1, 3, 4), y = c(1, 8, 9))

> dat3

  id2 y

1   1 1

2   3 8

3   4 9

> merge(dat2, dat3, by.x = "id", by.y = "id2")

  id y.x y.y

1  1   a   1

> merge(dat3, dat2, by.x= "id2", by.y="id") #순서 비교 

  id2 y.x y.y

1   1   1   a


> merge(dat2, dat3, by.x = "id", by.y = "id2", suffixes = c(".dat2", ".dat3"))

  id y.dat2 y.dat3

1  1      a      1



Combining Matrix 


<rbind>

You can distinguish the difference between setting ncol=2 or not with example below. 

> matrix(c(1:7,rep(1,7)))

 > matrix(c(1:7, rep(1, 7)), ncol = 2

       [,1]

[1,]    1

 [2,]    2

 [3,]    3

 [4,]    4

 [5,]    5

 [6,]    6

 [7,]    7

 [8,]    1

 [9,]    1

[10,]    1

[11,]    1

[12,]    1

[13,]    1

[14,]    1 

  [,1] [,2]

[1,]    1    1

[2,]    2    1

[3,]    3    1

[4,]    4    1

[5,]    5    1

[6,]    6    1

[7,]    7    1 


mat1 and mat2 has same length of columns because they have same setting which is ncol=2. So you can combine rows of two data frames which have same length of columns. From this example you find that combining the rows of matrixes are accomplished by 'rbind' which is same function used in combining data frames. 


mat1 = matrix(c(1:7, rep(1, 7)), ncol = 2)  

 mat2 = matrix(c(3, 4, 5:9, rep(3, 7)), ncol = 2)

 mat3 = rbind(mat1, mat2)

 > mat1

     [,1] [,2]

[1,]    1    1

[2,]    2    1

[3,]    3    1

[4,]    4    1

[5,]    5    1

[6,]    6    1

[7,]    7    1

> mat2

     [,1] [,2]

[1,]    3    3

[2,]    4    3

[3,]    5    3

[4,]    6    3

[5,]    7    3

[6,]    8    3

[7,]    9    3 

> mat3 

      [,1] [,2]

 [1,]    1    1

 [2,]    2    1

 [3,]    3    1

 [4,]    4    1

 [5,]    5    1

 [6,]    6    1

 [7,]    7    1

 [8,]    3    3

 [9,]    4    3

[10,]    5    3

[11,]    6    3

[12,]    7    3

[13,]    8    3

[14,]    9    3 


<cbind>

 

 mat4 = matrix(c(0:6, 7:13), nrow = 14)

 mat5 = cbind(mat3, mat4)

 > mat3

      [,1] [,2]

 [1,]    1    1

 [2,]    2    1

 [3,]    3    1

 [4,]    4    1

 [5,]    5    1

 [6,]    6    1

 [7,]    7    1

 [8,]    3    3

 [9,]    4    3

[10,]    5    3

[11,]    6    3

[12,]    7    3

[13,]    8    3

[14,]    9    3

> mat4

      [,1]

 [1,]    0

 [2,]    1

 [3,]    2

 [4,]    3

 [5,]    4

 [6,]    5

 [7,]    6

 [8,]    7

 [9,]    8

[10,]    9

[11,]   10

[12,]   11

[13,]   12

[14,]   13 

> mat5

      [,1] [,2] [,3]

 [1,]    1    1    0

 [2,]    2    1    1

 [3,]    3    1    2

 [4,]    4    1    3

 [5,]    5    1    4

 [6,]    6    1    5

 [7,]    7    1    6

 [8,]    3    3    7

 [9,]    4    3    8

[10,]    5    3    9

[11,]    6    3   10

[12,]    7    3   11

[13,]    8    3   12

[14,]    9    3   13 


Combining Vector 


> vec1 = c(1:7) 

> vec1

[1] 1 2 3 4 5 6 7

> vec2 = c(rep(1,5),6,7) 

> vec2

[1] 1 1 1 1 1 6 7


> rbind(vec1, vec2)

 > cbind(vec1, vec2)

      [,1] [,2] [,3] [,4] [,5] [,6] [,7]

vec1    1    2    3    4    5    6    7

vec2    1    1    1    1    1    6    7

     vec1 vec2

[1,]    1    1

[2,]    2    1

[3,]    3    1

[4,]    4    1

[5,]    5    1

[6,]    6    6

[7,]    7    7  


반응형