<The limit of scatter dot plot - geom_point()>
When the density of the graph in a particular range is high, we cannot really figure out how dense it is, and how many dots are in the position. For a solution, we can set alpha and shape of the dots, but the solutions are not enough to solve the limitations.
ggplot(data=diamonds,aes(x=x*y*z,y=price))+geom_point()
> ggplot(data=diamonds,aes(x=x*y*z,y=price))+geom_point(alpha=0.1)
> ggplot(data=diamonds,aes(x=x*y*z,y=price))+geom_point(shape=1)
> ggplot(data=diamonds,aes(x=x*y*z,y=price))+stat_bin2d()
> ggplot(data=diamonds,aes(x=x*y*z,y=price))+stat_bin2d(bins=50)
For this example, by setting the number of bins 50, the graph becomes having 2500 rectangles in total.
by setting the number of bins yourself, you can have a more concrete frequency graph.
> ggplot(data=diamonds,aes(x=x*y*z,y=price))+stat_bin2d(bins=50)+scale_fill_gradient(low="lightblue",high="black")
> ggplot(data=diamonds,aes(x=x*y*z,y=price))+stat_bin2d(bins=50)+scale_fill_gradient(low="lightblue",high="black",limits=c(0,9000))
the legend can be set manually by adding 'limits=c()'
<stat_binhex() - hexagon bins>
>install.packages("hexbin")
>library(hexbin)
> ggplot(data=diamonds,aes(x=x*y*z,y=price))+stat_binhex()
> ggplot(data=diamonds,aes(x=x*y*z,y=price))+stat_binhex()+scale_fill_gradient(low="lightblue",high="red",limits=c(0,12000))
> ggplot(data=diamonds,aes(x=x*y*z,y=price))+stat_binhex()+scale_fill_gradient(low="lightblue",high="red",breaks=c(1000,2000,3000,4000,5000,6000,7000,8000,9000,10000,11000,12000),limits=c(0,12000))
> ggplot(data=diamonds,aes(x=x*y*z,y=price))+stat_binhex()+scale_fill_gradient(low="lightblue",high="red",breaks=c(seq(from=0,to=12000,by=1000)),limits=c(0,12000))
seq(from=9,to=12000,by=1000) is the simple version of breaks=c(1000,2000,3000,4000,5000,6000,7000,8000,9000,10000,11000,12000)
<The reason for gray bins when setting the legend manually >
> ggplot(data=diamonds,aes(x=x*y*z,y=price))+stat_binhex()+scale_fill_gradient(low="lightblue",high="red",limits=c(0,2000))
you can set the legend manually by adding 'limits=c()'.
In this case, you should consider the range of counts carefully.
The gray hexagon represents that the frequencies(counts) of the area is over the range of legend.
<The way to jittering overlapped dots >
You may know that the limitation of the dot plot is a disability of representing frequencies.
I think jittering dots seems to fix the problem.
However, this method is only useful when using a discrete variable in x or y.
> ggplot(data=mpg,aes(x=class,y=cty))+geom_point()
you cannot know the frequencies of dots by just using geom_point()
> ggplot(data=mpg,aes(x=class,y=cty))+geom_point(position="jitter")
> ggplot(data=mpg,aes(x=class,y=cty))+geom_point(position=position_jitter(width=0.3,height=0))