본문 바로가기
R

(R)Wilkinson dot plot (+boxplot)

by jangpiano 2020. 10. 29.
반응형

<Wilkinson dot plot>

> library(ggplot2)


> str(mpg)


tibble [234 x 11] (S3: tbl_df/tbl/data.frame)

 $ manufacturer: chr [1:234] "audi" "audi" "audi" "audi" ...

 $ model       : chr [1:234] "a4" "a4" "a4" "a4" ...

 $ displ       : num [1:234] 1.8 1.8 2 2 2.8 2.8 3.1 1.8 1.8 2 ...

 $ year        : int [1:234] 1999 1999 2008 2008 1999 1999 2008 1999 1999 2008 ...

 $ cyl         : int [1:234] 4 4 4 4 6 6 6 4 4 4 ...

 $ trans       : chr [1:234] "auto(l5)" "manual(m5)" "manual(m6)" "auto(av)" ...

 $ drv         : chr [1:234] "f" "f" "f" "f" ...

 $ cty         : int [1:234] 18 21 20 21 16 18 18 18 16 20 ...

 $ hwy         : int [1:234] 29 29 31 30 26 26 27 26 25 28 ...

 $ fl          : chr [1:234] "p" "p" "p" "p" ...

 $ class       : chr [1:234] "compact" "compact" "compact" "compact" ...


> mpg_A<-subset(mpg,drv==4)


> str(mpg_A)

tibble [103 x 11] (S3: tbl_df/tbl/data.frame)

 $ manufacturer: chr [1:103] "audi" "audi" "audi" "audi" ...

 $ model       : chr [1:103] "a4 quattro" "a4 quattro" "a4 quattro" "a4 quattro" ...

 $ displ       : num [1:103] 1.8 1.8 2 2 2.8 2.8 3.1 3.1 2.8 3.1 ...

 $ year        : int [1:103] 1999 1999 2008 2008 1999 1999 2008 2008 1999 2008 ...

 $ cyl         : int [1:103] 4 4 4 4 6 6 6 6 6 6 ...

 $ trans       : chr [1:103] "manual(m5)" "auto(l5)" "manual(m6)" "auto(s6)" ...

 $ drv         : chr [1:103] "4" "4" "4" "4" ...

 $ cty         : int [1:103] 18 16 20 19 15 17 17 15 15 17 ...

 $ hwy         : int [1:103] 26 25 28 27 25 25 25 25 24 25 ...

 $ fl          : chr [1:103] "p" "p" "p" "p" ...

 $ class       : chr [1:103] "compact" "compact" "compact" "compact" ...


> mpg_B<-subset(mpg,drv==4&year==2008)


> str(mpg_B)


tibble [54 x 11] (S3: tbl_df/tbl/data.frame)

 $ manufacturer: chr [1:54] "audi" "audi" "audi" "audi" ...

 $ model       : chr [1:54] "a4 quattro" "a4 quattro" "a4 quattro" "a4 quattro" ...

 $ displ       : num [1:54] 2 2 3.1 3.1 3.1 4.2 5.3 5.3 3.7 3.7 ...

 $ year        : int [1:54] 2008 2008 2008 2008 2008 2008 2008 2008 2008 2008 ...

 $ cyl         : int [1:54] 4 4 6 6 6 8 8 8 6 6 ...

 $ trans       : chr [1:54] "manual(m6)" "auto(s6)" "auto(s6)" "manual(m6)" ...

 $ drv         : chr [1:54] "4" "4" "4" "4" ...

 $ cty         : int [1:54] 20 19 17 15 17 16 14 11 15 14 ...

 $ hwy         : int [1:54] 28 27 25 25 25 23 19 14 19 18 ...

 $ fl          : chr [1:54] "p" "p" "p" "p" ...

 $ class       : chr [1:54] "compact" "compact" "compact" "compact" ...


> ggplot(mpg_B,aes(x=displ))+geom_dotplot()

>ggplot(mpg_B,aes(x=displ))+geom_dotplot(binwidth=0.25)

<delete the meaningless tick of the y-axis


When it comes to Wilkinson dot plot which uses geom_dotplot() does not have a meaningful tick of y-axis. Those are the meaningless. So we can delete these confusing numbers by using scale_y_continuous(breaks=NULL)


> ggplot(mpg_B,aes(x=displ))+geom_dotplot(binwidth=0.25)+scale_y_continuous(breaks=NULL)


<Add the appropriate ticks to X-axis

You can add the exact position of those dots by adding ticks to X-axis. And geom_rug() makes it be possible. 


>ggplot(mpg_B,aes(x=displ))+geom_dotplot(binwidth=0.25)+scale_y_continuous(breaks=NULL)+geom_rug()


<Collecting dots in the middle>


>ggplot(mpg_B,aes(x=displ))+geom_dotplot(binwidth=0.25, stackdir="centerwhole")+geom_rug()+scale_y_continuous(breaks=NULL)



<Draw the grouped Wilkinson dot plots>


> table(ChickWeight$Diet)


  1   2   3   4 

220 120 120 118 

> ChickWeight_A<-subset(ChickWeight,Diet==1|Diet==4)

> table(ChickWeight_A$Diet)


  1   2   3   4 

220   0   0 118 


We need binaxis="y" inside geom_dotplot() to stack the dots on Y-axis. 

Without it, it prints out 'error.'


> ggplot(ChickWeight_A, aes(x=Diet, y=weight))+geom_dotplot(binaxis="y",binwidth=7,stackdir="center")

<stackdir="center" VS stackdir="centerwhole">

Two alignment methods when the points on the same y-axis are even or odd.  


 > ggplot(ChickWeight_A, aes(x=Diet, y=weight))+geom_dotplot(binaxis="y",binwidth=7,stackdir="center")

 > ggplot(ChickWeight_A, aes(x=Diet, y=weight))+geom_dotplot(binaxis="y",binwidth=7,stackdir="centerwhole")

 

 


<Add boxplot to dotplot>

>ggplot(ChickWeight_A,aes(x=Diet,y=weight))+geom_boxplot()

> ggplot(ChickWeight_A,aes(x=Diet,y=weight))+geom_boxplot(outlier.colour=NA)

> ggplot(ChickWeight_A,aes(x=Diet,y=weight))+geom_boxplot(fill=NA)

>  ggplot(ChickWeight_A, aes(x=Diet, y=weight))+geom_dotplot(binaxis="y",binwidth=7,stackdir="center")+geom_boxplot(fill=NA)


> ggplot(ChickWeight_A, aes(x=Diet, y=weight))+geom_dotplot(binaxis="y",binwidth=7,stackdir="center",fill=NA)+geom_boxplot(fill=NA,outlier.colour=NA)

*How to distinguish Outliers to dots of dotplot

> ggplot(ChickWeight_A, aes(x=Diet, y=weight))+geom_dotplot(binaxis="y",binwidth=7,stackdir="center",fill=NA)+geom_boxplot(fill=NA)

> ggplot(ChickWeight_A, aes(x=Diet, y=weight))+geom_dotplot(binaxis="y",binwidth=7,stackdir="center",fill=NA)+geom_boxplot(fill=NA,outlier.colour="red")


<Move the boxplots from the dotplots>


The last code that is emphasized is to switch the ticks of X-axis according to factor level so the appropriate ticks will be between the dot plots and the boxplots. 

> ggplot(ChickWeight_A,aes(x=Diet,y=weight))+geom_boxplot(aes(x=as.numeric(Diet)+1,group=Diet))+geom_dotplot((aes(x=as.numeric(Diet)-1,group=Diet)),width=3,binaxis="y",binwidth=7,stackdir="center")+scale_x_continuous(breaks=1:nlevels(ChickWeight_A$Diet),labels=levels(ChickWeight_A$Diet))






반응형