<Histogram -> Scale Histogram>
density curve: always above the horizontal x-axis.
total area =1
histogram: 'counts' on the vertical y-axis
the disadvantage of the histogram: disadvantage of highly depending on the number of observations. So it is not an appropriate method for comparing two histograms when the data that histograms express have a different numbers of observations. if you want to compare two graphs.
For example, when you want to compare the distributions for two cases which are 'No smoke' and 'smoke' respectively, It is inappropriate to compare the distributions by comparing the graphs because there are the different number of total observations.
> table(birthwt$smoke)
NO smoke smoke
115 74
> ggplot(data=birthwt,aes(x=bwt))+geom_histogram(fill="white",colour="black")+facet_grid(vars(smoke))
That is the reason to make scale histogram,
When it comes to scale histogram, numbers of y axis denotes the proportion of data.
So the height of the graph becomes expressing counts/width * total number.
proportions = width * height = counts/ total number
> ggplot(data=birthwt,aes(x=bwt,y=..density..))+geom_histogram(fill="white",colour="black")+facet_grid(vars(smoke))
Histogram |
Scale Histogram |
height=counts/width*total number of observations |
height=counts |
<Scale Histogram VS Density Curve>
> ggplot(data=birthwt,aes(x=bwt,y=..density..))+geom_histogram(fill="white",colour="black")+facet_grid(vars(smoke))
> ggplot(data=birthwt,aes(x=bwt))+geom_line(stat="density",adjust=0.25,colour="red")+geom_line(stat="density")+geom_line(stat="density",adjust=2,colour="blue")+facet_grid(vars(smoke))
> ggplot(data=birthwt,aes(x=bwt,y=..density..))+geom_histogram(binwidth=300)+geom_density()+facet_grid(vars(smoke))
> ggplot(data=birthwt,aes(x=bwt,y=..density..))+geom_histogram(binwidth=200)+geom_density(adjust=0.5)+facet_grid(vars(smoke))
> ggplot(data=birthwt,aes(x=bwt,y=..density..))+geom_histogram(binwidth=200)+geom_density(adjust=0.3)+facet_grid(vars(smoke))
<EXAMPLE1>
>library(plyr)
>library(MASS)
>birthwt$smoke<-factor(birthwt$smoke)
>birthwt$smoke<-revalue(birthwt$smoke, c("0"="NO smoke","1"="smoke"))
> table(birthwt$smoke)
NO smoke smoke
115 74
>ggplot(data=birthwt,aes(x=bwt))+geom_histogram(fill="white",colour="black")+facet_grid(vars(smoke))
> ggplot(data=birthwt,aes(x=bwt,y=..density..))+geom_histogram(fill="white",colour="black")+facet_grid(vars(smoke))
> ggplot(data=trees,aes(x=Volume))+geom_histogram(fill="yellow",colour="grey")
> ggplot(data=trees,aes(x=Volume,y=..density..))+geom_histogram(fill="yellow",colour="grey")
> ggplot(data=trees,aes(x=Volume,y=..density..))+geom_density()
> ggplot(data=trees,aes(x=Volume,y=..density..))+geom_histogram(fill="yellow",colour="grey")+geom_density()