본문 바로가기
R

(R) Histogram to Scale Histogram / Scale Histogram and Density curve/binwidth/adjust of geom_density()

by jangpiano 2020. 9. 4.
반응형

<Histogram -> Scale Histogram>

density curve: always above the horizontal x-axis. 

                   total area =1


histogram: 'counts' on the vertical y-axis

               

the disadvantage of the histogram: disadvantage of highly depending on the number of observations. So it is not an appropriate method for comparing two histograms when the data that histograms express have a different numbers of observations. if you want to compare two graphs. 


For example, when you want to compare the distributions for two cases which are 'No smoke' and 'smoke' respectively, It is inappropriate to compare the distributions by comparing the graphs because there are the different number of total observations. 

            

> table(birthwt$smoke)


NO smoke    smoke 

     115       74 


> ggplot(data=birthwt,aes(x=bwt))+geom_histogram(fill="white",colour="black")+facet_grid(vars(smoke))


That is the reason to make scale histogram, 

When it comes to scale histogram, numbers of y axis denotes the proportion of data. 

So the height of the graph becomes expressing counts/width * total number. 

proportions = width * height = counts/ total number 


> ggplot(data=birthwt,aes(x=bwt,y=..density..))+geom_histogram(fill="white",colour="black")+facet_grid(vars(smoke))


 Histogram 

 Scale Histogram 

 

height=counts/width*total number of observations

 

height=counts


<Scale Histogram VS Density Curve>

> ggplot(data=birthwt,aes(x=bwt,y=..density..))+geom_histogram(fill="white",colour="black")+facet_grid(vars(smoke))

> ggplot(data=birthwt,aes(x=bwt))+geom_line(stat="density",adjust=0.25,colour="red")+geom_line(stat="density")+geom_line(stat="density",adjust=2,colour="blue")+facet_grid(vars(smoke))


> ggplot(data=birthwt,aes(x=bwt,y=..density..))+geom_histogram(binwidth=300)+geom_density()+facet_grid(vars(smoke))

> ggplot(data=birthwt,aes(x=bwt,y=..density..))+geom_histogram(binwidth=200)+geom_density(adjust=0.5)+facet_grid(vars(smoke))

> ggplot(data=birthwt,aes(x=bwt,y=..density..))+geom_histogram(binwidth=200)+geom_density(adjust=0.3)+facet_grid(vars(smoke))



<EXAMPLE1>


>library(plyr)

>library(MASS)


>birthwt$smoke<-factor(birthwt$smoke)

>birthwt$smoke<-revalue(birthwt$smoke, c("0"="NO smoke","1"="smoke"))

> table(birthwt$smoke)


NO smoke    smoke 

     115       74 

>ggplot(data=birthwt,aes(x=bwt))+geom_histogram(fill="white",colour="black")+facet_grid(vars(smoke))

> ggplot(data=birthwt,aes(x=bwt,y=..density..))+geom_histogram(fill="white",colour="black")+facet_grid(vars(smoke))


> ggplot(data=birthwt,aes(x=bwt,y=..density..))+geom_histogram(binwidth=200,colour="grey",fill="white")
+geom_density()+facet_grid(vars(smoke))

<EXAMPLE2>

> ggplot(data=trees,aes(x=Volume))+geom_histogram(fill="yellow",colour="grey")


> ggplot(data=trees,aes(x=Volume,y=..density..))+geom_histogram(fill="yellow",colour="grey")


> ggplot(data=trees,aes(x=Volume,y=..density..))+geom_density()


> ggplot(data=trees,aes(x=Volume,y=..density..))+geom_histogram(fill="yellow",colour="grey")+geom_density()


반응형