본문 바로가기
R

(R) The way to interpret boxplot in R

by jangpiano 2020. 9. 7.
반응형

<Interpret Boxplot - summary(), boxplot()$stat, quantile()>


> var1= rnorm(60,5,3)

> var1

 [1]  5.72545365  6.46736004  9.05905214  3.81404717

 [5]  4.14243244  6.22773545  4.22229999 -0.09668701

 [9]  1.92385930  2.68091447  2.02079779 11.82006166

[13]  6.30123986  6.82444436  5.16561901  5.65673597

[17]  6.80099706  8.14009518  5.84585539  4.54134577

[21]  7.05242009  1.82990254  6.48332813 11.63180405

[25]  0.27728859  5.68270070  4.78730331  0.59477055

[29]  7.00621227  8.11715012  3.19798347 12.72825648

[33]  5.54560819  7.94174724  2.86054389  6.34992079

[37]  6.02490056  5.92308010  8.07924831  4.09084170

[41]  8.85541324  9.59660288  1.95769879 12.63010203

[45]  2.35129110 -2.29882675  8.25781232  4.96743026

[49]  7.98006531  9.66460176  5.85261928  2.29375485

[53]  4.54203126  6.20910944  5.99382815  4.02295829

[57]  5.49261074  5.49519994  5.99051028  8.51209252


> quantile(var1,0.25)

     25% 

4.073871 

> quantile(var1,0.5)

     50% 

5.849237 

> quantile(var1,0.75)

     75% 

7.274752 


> quantile(var1,c(0.25,0.5,0.75))

     25%      50%      75% 

4.073871 5.849237 7.274752


> summary(var1)

      var1       

 Min.   :-2.299  

 1st Qu.: 4.074  

 Median : 5.849  

 Mean   : 5.698  

 3rd Qu.: 7.275  

 Max.   :12.728  


> boxplot(var1$var1)$stat

            [,1]

[1,] -0.09668701

[2,]  4.05689999

[3,]  5.84923734

[4,]  7.49708366

[5,] 12.63010203



> boxplot(var1)


> var1<-as.data.frame(var1)

> ggplot(data=var1,aes(X=1,y=var1))+geom_boxplot()


> summary(mpg$cty)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 

   9.00   14.00   17.00   16.86   19.00   35.00 


> boxplot(mpg$cty)$stat

     [,1]

[1,]    9

[2,]   14

[3,]   17

[4,]   19

[5,]   26

attr(,"class")

        1 

"integer" 


> ggplot(data=mpg,aes(x=1,y=cty))+geom_boxplot()

> boxplot(mpg$cty)



<Interpret boxplot> 

-The first quantile : (Q1/ 25TH percentile) median of the lower half OF THE DATASET. 

-median (The second quantile) : (Q2. 50TH PERCENTILE) IS CALLED MEDIAN WHICH IS THE MIDDLE VALUE OF DATA SET.

-The third quantile : (Q3/ 75TH percentile) MEDIAN OF THE UPPER HALF OF THE DATASET. 

- outlier : less than Q1-1.5(IQR) and larger than Q3+ 1.5(IQR)

              excessively low or high. 

- Sample range: difference between max – min

- Interquartile range: difference Q3 –Q1

                              Interquartile range (length of box plot): how the data is dispersed. The longer the box the more dispersed the data.                                              The smaller the less dispersed the data. 


<The Position of Median - skewness

Median is one of the representative's value and plays a core role in a boxplot. 

If the distribution is skewed to the right, mean will be greater than the median and the median is closer to the left line of the box in the box plot and the right tail is longer than left tail.


If median equals to mean in the boxplot, It shows that the distribution looks symmetric, normally distributed. 


If the distribution is skewed to the left, mean will be less than the median. median is closer to the right line of the box in the box plot and the left tail is longer than right tail. 








반응형