简体   繁体   English

将数据集中的中位数添加到箱线图中-R

[英]adding median from dataset into an boxplots - R

I have a large dataset called nkv.gen that I have used to create this boxplot: 我有一个名为nkv.gen的大型数据集,用于创建该箱形图:

> head(nkv.gen)
    Berechnung         Situation   NK  PID Case  Differenz Prozess           Objektart
2 Berechnung 1 Nach Massnahme GS 7.64 3084    1  -4.140527 Murgang single family house
3 Berechnung 2 Nach Massnahme GS 7.68 3084    1  -3.638645 Murgang single family house
4 Berechnung 3 Nach Massnahme GS 7.72 3084    1  -3.136763 Murgang single family house
5 Berechnung 4 Nach Massnahme GS 7.73 3084    1  -3.011292 Murgang single family house
6 Berechnung 5 Nach Massnahme GS 7.78 3084    1  -2.383940 Murgang single family house
7 Berechnung 6 Nach Massnahme GS 4.39 3084    1 -44.918444 Murgang single family house

> str(nkv.gen)
'data.frame':   5062 obs. of  8 variables:
 $ Berechnung: Factor w/ 51 levels "Berechnung 1",..: 1 12 23 34 45 47 48 49 50 2 ...
 $ Situation : Factor w/ 37 levels "Nach Massnahme Ablenk- und Auffangd&auml",..: 10 10 10 10 10 10 10 10 10 10 ...
 $ NK        : num  7.64 7.68 7.72 7.73 7.78 4.39 4.43 4.44 4.45 4.46 ...
 $ PID       : int  3084 3084 3084 3084 3084 3084 3084 3084 3084 3084 ...
 $ Case      : int  1 1 1 1 1 1 1 1 1 1 ...
 $ Differenz : num  -4.14 -3.64 -3.14 -3.01 -2.38 ...
 $ Prozess   : Factor w/ 1 level "Murgang": 1 1 1 1 1 1 1 1 1 1 ...
 $ Objektart : Factor w/ 6 levels "single family house",..: 1 1 1 1 1 1 1 1 1 1 ...

xlabs <- paste(levels(nkv.gen$Objektart),"\n(N=",table(nkv.gen$Objektart),")",sep="")
p1 <- ggplot(nkv.gen, aes(x= factor(Objektart), y= NK)) +
  geom_boxplot() + scale_x_discrete(labels=xlabs) +
  labs ( x = "object type", y = "cost/benefit ratio") + 
  ggtitle ("cost/benefit ratio (CBR)") +
  geom_hline(yintercept = 1 , linetype = "dashed", color = "red", size)+ 
  theme (axis.text.x=element_text(size=9, angle = 45, hjust = 1)) 
color = "red")
p1

在此处输入图片说明

Now, into this existing p1 boxplot, i would like to add some information based on the data from nkv.ori . 现在,我想基于nkv.ori的数据在此现有的p1箱图中添加一些信息。 I want to calculate the median for every Objektart within the dataset nkv.ori and plot this values (as a red dot) into the existing boxplot p1 . 我想计算数据集nkv.ori中每个Objektart的中位数,并将此值(作为红点)绘制到现有的箱线图p1

> dput(head(nkv.ori,102))
structure(list(Berechnung = structure(c(51L, 51L, 51L, 51L, 51L, 
51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 
51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 
51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 
51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 
51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 
51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 
51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 
51L, 51L, 51L, 51L, 51L, 51L), .Label = c("Berechnung 1", "Berechnung 10", 
"Berechnung 11", "Berechnung 12", "Berechnung 13", "Berechnung 14", 
"Berechnung 15", "Berechnung 16", "Berechnung 17", "Berechnung 18", 
"Berechnung 19", "Berechnung 2", "Berechnung 20", "Berechnung 21", 
"Berechnung 22", "Berechnung 23", "Berechnung 24", "Berechnung 25", 
"Berechnung 26", "Berechnung 27", "Berechnung 28", "Berechnung 29", 
"Berechnung 3", "Berechnung 30", "Berechnung 31", "Berechnung 32", 
"Berechnung 33", "Berechnung 34", "Berechnung 35", "Berechnung 36", 
"Berechnung 37", "Berechnung 38", "Berechnung 39", "Berechnung 4", 
"Berechnung 40", "Berechnung 41", "Berechnung 42", "Berechnung 43", 
"Berechnung 44", "Berechnung 45", "Berechnung 46", "Berechnung 47", 
"Berechnung 48", "Berechnung 49", "Berechnung 5", "Berechnung 50", 
"Berechnung 6", "Berechnung 7", "Berechnung 8", "Berechnung 9", 
"EconoMe original"), class = "factor"), Situation = structure(c(10L, 
5L, 1L, 9L, 2L, 17L, 8L, 18L, 22L, 23L, 3L, 20L, 27L, 7L, 29L, 
30L, 32L, 33L, 31L, 13L, 12L, 28L, 24L, 21L, 14L, 16L, 4L, 26L, 
11L, 25L, 34L, 6L, 10L, 5L, 1L, 2L, 8L, 3L, 20L, 27L, 7L, 29L, 
30L, 32L, 33L, 31L, 28L, 21L, 16L, 34L, 6L, 8L, 18L, 22L, 23L, 
20L, 27L, 36L, 34L, 10L, 1L, 2L, 37L, 18L, 22L, 23L, 3L, 20L, 
27L, 28L, 24L, 21L, 34L, 10L, 17L, 18L, 22L, 23L, 3L, 20L, 27L, 
29L, 30L, 32L, 33L, 31L, 13L, 28L, 24L, 21L, 4L, 26L, 11L, 25L, 
6L, 3L, 20L, 4L, 26L, 11L, 25L, 34L), .Label = c("Nach Massnahme Ablenk- und Auffangd&auml", 
"Nach Massnahme Bestvariante Fallzug", "Nach Massnahme Camere", 
"Nach Massnahme Daemme und Ablagerungsraum", "Nach Massnahme Damm inkl. Verlaengerung Durchlass", 
"Nach Massnahme Damm und Ablagerungsraum", "Nach Massnahme Digue de derivation-retention et arriere-digue", 
"Nach Massnahme Digues et ouvrage de limitation", "Nach Massnahme Dossierbauwerk", 
"Nach Massnahme GS", "Nach Massnahme Hochpunkt", "Nach Massnahme Hochwasserschutz Mehlbach", 
"Nach Massnahme Hochwasserschutzkonzept Emsbach", "Nach Massnahme Hochwasserschutzmassnahmen kleine Simme", 
"Nach Massnahme Hochwasserschutzvariante 1 B<e4>chibach", "Nach Massnahme Hochwasserschutzvariante 1 Bächibach", 
"Nach Massnahme HWSP Lowigrabo", "Nach Massnahme Lawinen / Holzrechen", 
"Nach Massnahme Leitd<e4>mme", "Nach Massnahme Leitdämme", "Nach Massnahme Massnahmen", 
"Nach Massnahme Murgang Damm", "Nach Massnahme Murgang Netz", 
"Nach Massnahme Renforcement-rehaussement de la digue", "Nach Massnahme Schutzmassnahmen Milibach", 
"Nach Massnahme Strassendurchlass Kantonsstrasse", "Nach Massnahme Tr?hlibach Beckenried, Massnahmen 1 bis 3", 
"Nach Massnahme Variante 1", "Nach Massnahme Variante 1A", "Nach Massnahme Variante 1B", 
"Nach Massnahme Variante 1B+", "Nach Massnahme Variante 2", "Nach Massnahme Variante 3", 
"Nach Massnahme Vorstudie", "Nach Massnahme Gazex + digues de d<e9>viation et d", 
"Nach Massnahme Gazex + digues de déviation et d", "Nach Massnahme Neue Gerinnefuehrung Gafenbach"
), class = "factor"), NK = c(7.97, 0, 12.71, 18.06, 7.18, 1.78, 
2.11, 0, 5.12, 6.51, 1.74, 5.14, 2.2, 5.43, 0.98, 0.88, 1.12, 
1.12, 0.8, 3.35, 0.51, 1.66, 2.51, 0.7, 0.38, 1.27, 4.25, 28.01, 
8.4, 1.84, 1.3, 1.64, 7.97, 0, 12.71, 7.18, 2.11, 1.74, 5.14, 
2.2, 5.43, 0.98, 0.88, 1.12, 1.12, 0.8, 1.66, 0.7, 1.27, 1.3, 
1.64, 2.11, 0, 5.12, 6.51, 5.14, 2.2, 0.22, 1.3, 7.97, 12.71, 
7.18, 0, 0, 5.12, 6.51, 1.74, 5.14, 2.2, 1.66, 2.51, 0.7, 1.3, 
7.97, 1.78, 0, 5.12, 6.51, 1.74, 5.14, 2.2, 0.98, 0.88, 1.12, 
1.12, 0.8, 3.35, 1.66, 2.51, 0.7, 4.25, 28.01, 8.4, 1.84, 1.64, 
0, 0, 0, 0, 0, -0.22, 0), PID = c(3084L, 2844L, 2707L, 2707L, 
2707L, 2547L, 2534L, 2497L, 2497L, 2497L, 2494L, 2492L, 2478L, 
2383L, 2351L, 2351L, 2351L, 2351L, 2351L, 2341L, 2193L, 2190L, 
2187L, 2157L, 2104L, 2103L, 2079L, 2079L, 2079L, 2079L, 2026L, 
2022L, 3084L, 2844L, 2707L, 2707L, 2534L, 2494L, 2492L, 2478L, 
2383L, 2351L, 2351L, 2351L, 2351L, 2351L, 2190L, 2157L, 2103L, 
2026L, 2022L, 2534L, 2497L, 2497L, 2497L, 2492L, 2478L, 2125L, 
2026L, 3084L, 2707L, 2707L, 2639L, 2497L, 2497L, 2497L, 2494L, 
2492L, 2478L, 2190L, 2187L, 2157L, 2026L, 3084L, 2547L, 2497L, 
2497L, 2497L, 2494L, 2492L, 2478L, 2351L, 2351L, 2351L, 2351L, 
2351L, 2341L, 2190L, 2187L, 2157L, 2079L, 2079L, 2079L, 2079L, 
2022L, 2494L, 2492L, 2079L, 2079L, 2079L, 2079L, 2026L), Case = c(1L, 
2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 
16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L, 27L, 28L, 
29L, 30L, 31L, 32L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 
11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 1L, 2L, 3L, 4L, 
5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 
12L, 13L, 14L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 
12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 1L, 2L, 
3L, 4L, 5L, 6L, 7L), Differenz = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0), Prozess = structure(c(1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L), .Label = "Murgang", class = "factor"), Objektart = structure(c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 
6L, 6L, 6L, 6L, 6L), .Label = c("single family house", "garage", 
"hotel", "industry", "appartment building", "public building"
), class = "factor")), .Names = c("Berechnung", "Situation", 
"NK", "PID", "Case", "Differenz", "Prozess", "Objektart"), row.names = c(1L, 
52L, 103L, 154L, 205L, 256L, 307L, 358L, 409L, 460L, 511L, 562L, 
613L, 664L, 715L, 766L, 817L, 868L, 919L, 970L, 1021L, 1072L, 
1123L, 1174L, 1225L, 1276L, 1327L, 1378L, 1429L, 1480L, 1531L, 
1582L, 1633L, 1684L, 1735L, 1786L, 1837L, 1888L, 1939L, 1990L, 
2041L, 2092L, 2143L, 2194L, 2245L, 2296L, 2347L, 2398L, 2449L, 
2500L, 2551L, 2602L, 2653L, 2704L, 2755L, 2806L, 2857L, 2908L, 
2959L, 3010L, 3061L, 3112L, 3163L, 3214L, 3265L, 3316L, 3367L, 
3418L, 3469L, 3520L, 3571L, 3622L, 3673L, 3724L, 3775L, 3826L, 
3877L, 3928L, 3979L, 4030L, 4081L, 4132L, 4183L, 4234L, 4285L, 
4336L, 4387L, 4438L, 4489L, 4540L, 4591L, 4642L, 4693L, 4744L, 
4795L, 4846L, 4859L, 4910L, 4961L, 5012L, 5063L, 5114L), class = "data.frame")

I thought this would be easy as the dataset have identical layout, but i am stuck. 我认为这很容易,因为数据集具有相同的布局,但是我被卡住了。 Any tips or advice? 有任何提示或建议吗?

This is a much simpler example, but does the same thing. 这是一个简单得多的示例,但功能相同。 I think your issue is that you started by providing your dataset and the aes() variables inside ggplot . 我认为您的问题是,首先要在ggplot提供数据集和aes()变量。 It's better if you do it later within the geom_boxplot and geom_point , especially when you want to plot using multiple datasets, like this: 最好稍后再在geom_boxplotgeom_point ,尤其是当您想使用多个数据集进行绘制时,例如:

library(dplyr)
library(ggplot2)

# first dataset
dt1 = mtcars %>% mutate(cyl = factor(cyl))

# second dataset (values for each cyl; could be the medians after you calculate them)
dt2 = data.frame(cyl = factor(c(4,6,8)),
                 value = c(100, 188, 358))

ggplot() +
  geom_boxplot(data = dt1, aes(cyl,disp)) +
  geom_point(data = dt2, aes(cyl, value), col="red")

在此处输入图片说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM