<Interpret Boxplot - summary(), boxplot()$stat, quantile()>
> var1= rnorm(60,5,3)
> var1
[1] 5.72545365 6.46736004 9.05905214 3.81404717
[5] 4.14243244 6.22773545 4.22229999 -0.09668701
[9] 1.92385930 2.68091447 2.02079779 11.82006166
[13] 6.30123986 6.82444436 5.16561901 5.65673597
[17] 6.80099706 8.14009518 5.84585539 4.54134577
[21] 7.05242009 1.82990254 6.48332813 11.63180405
[25] 0.27728859 5.68270070 4.78730331 0.59477055
[29] 7.00621227 8.11715012 3.19798347 12.72825648
[33] 5.54560819 7.94174724 2.86054389 6.34992079
[37] 6.02490056 5.92308010 8.07924831 4.09084170
[41] 8.85541324 9.59660288 1.95769879 12.63010203
[45] 2.35129110 -2.29882675 8.25781232 4.96743026
[49] 7.98006531 9.66460176 5.85261928 2.29375485
[53] 4.54203126 6.20910944 5.99382815 4.02295829
[57] 5.49261074 5.49519994 5.99051028 8.51209252
> quantile(var1,0.25)
25%
4.073871
> quantile(var1,0.5)
50%
5.849237
> quantile(var1,0.75)
75%
7.274752
> quantile(var1,c(0.25,0.5,0.75))
25% 50% 75%
4.073871 5.849237 7.274752
> summary(var1)
var1
Min. :-2.299
1st Qu.: 4.074
Median : 5.849
Mean : 5.698
3rd Qu.: 7.275
Max. :12.728
> boxplot(var1$var1)$stat
[,1]
[1,] -0.09668701
[2,] 4.05689999
[3,] 5.84923734
[4,] 7.49708366
[5,] 12.63010203
> boxplot(var1)
> var1<-as.data.frame(var1)
> ggplot(data=var1,aes(X=1,y=var1))+geom_boxplot()
> summary(mpg$cty)
Min. 1st Qu. Median Mean 3rd Qu. Max.
9.00 14.00 17.00 16.86 19.00 35.00
> boxplot(mpg$cty)$stat
[,1]
[1,] 9
[2,] 14
[3,] 17
[4,] 19
[5,] 26
attr(,"class")
1
"integer"
> ggplot(data=mpg,aes(x=1,y=cty))+geom_boxplot()
> boxplot(mpg$cty)
<Interpret boxplot>
-The first quantile : (Q1/ 25TH percentile) median of the lower half OF THE DATASET.
-median (The second quantile) : (Q2. 50TH PERCENTILE) IS CALLED MEDIAN WHICH IS THE MIDDLE VALUE OF DATA SET.
-The third quantile : (Q3/ 75TH percentile) MEDIAN OF THE UPPER HALF OF THE DATASET.
- outlier : less than Q1-1.5(IQR) and larger than Q3+ 1.5(IQR)
excessively low or high.
- Sample range: difference between max – min
- Interquartile range: difference Q3 –Q1
Interquartile range (length of box plot): how the data is dispersed. The longer the box the more dispersed the data. The smaller the less dispersed the data.
<The Position of Median - skewness>
Median is one of the representative's value and plays a core role in a boxplot.
If the distribution is skewed to the right, mean will be greater than the median and the median is closer to the left line of the box in the box plot and the right tail is longer than left tail.
If median equals to mean in the boxplot, It shows that the distribution looks symmetric, normally distributed.
If the distribution is skewed to the left, mean will be less than the median. median is closer to the right line of the box in the box plot and the left tail is longer than right tail.
'R' 카테고리의 다른 글
(R) geom_violin() - a density graph (0) | 2020.09.09 |
---|---|
stat_summary (0) | 2020.09.09 |
(R) comparing several kernal density curves (0) | 2020.09.04 |
(R) Histogram to Scale Histogram / Scale Histogram and Density curve/binwidth/adjust of geom_density() (0) | 2020.09.04 |
(R) Kernal density curve /geom_density()/geom_line(stat="density") (0) | 2020.09.03 |