使用dplyr的for循環匯總返回的結果與group_by不同

Question

申請時我得到了奇怪的結果for環路dplyr不知道為什么或如何解決它-總結功能。

test <- data.frame(title = c("a", "b", "c","a","b","c", "a", "b", "c","a","b","c"),
                       category = c("A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B"),
                       sex = c("m", "m", "m", "f", "f", "f", "m", "m", "m", "f", "f", "f"),
                       salary = c(50,70,90,40,60,85, 220,270,350,180,200,330))

category_list <- unique(test$category)

tmp = list()

for (category in category_list) {
  # Create an average salary line for the category
  tmp[category] <- test %>% 
    filter(category == category) %>%
    summarise(mean(salary))
  print(tmp)
}

我得到這個作為輸出

$A
[1] 162.0833

$A
[1] 162.0833

$B
[1] 162.0833

其中， group_by()函數返回適當的結果：

    test %>% group_by(category) %>% summarise(mean(salary))
# A tibble: 2 x 2
  category `mean(salary)`
  <fct>             <dbl>
1 A                  65.8
2 B                 258.

替換特定類別確實會返回適當的結果：

test %>% 
        filter(category == "A") %>%
        summarise(mean(salary))
      mean(salary)
1     65.83333

因此， category_list對象可能有問題嗎？ 令人驚訝的是，當我調用category_list對象的第一個元素時，我也得到了適當的答案：

test %>% 
+     filter(category == category_list[1]) %>%
+     summarise(mean(salary))
  mean(salary)
1     65.83333

我想弄清楚（而不使用group_by ）的原因是因為我試圖制作一個腳本，該腳本將創建多個ggplot對象，然后將這些對象與gridExtra庫合並。

也許我錯了，可以使用group_by但是我想到的唯一方法是使用以下偽代碼：

1）按category創建均值列表，以在geom_hline()參數中使用
2）按category對數據幀對象進行子集化，每個子集將在ggplot中使用其geom_hline()
3）為每個category創建一個繪圖對象列表
4）使用grid.arrange()從gridExtra文庫的外側for循環到每個情節結合在一起

到目前為止，這是我的代碼（無法正常工作）：

library(gridExtra)
p = list()
avg_line = list()
tmp = list()
category_data = data.frame()
for (category in category_list) {
  # Create an average salary line for the category
  tmp[[category]] <- test %>% 
    filter(category == category) %>%
    summarise(mean(salary))
  avg_line[[category]] <- tmp[[2]]

  # Subset data frame on category 
  category_data[[category]] <- test %>% filter(category == category)

  # Make plots for each category
  p[[category]] <-
    ggplot(category_data[[category]], aes(x = title, y = salary)) +
  geom_line(color = "white") +
  geom_point(aes(color =sex)) +
  scale_color_manual(values = c("#F49171", "#81C19C")) +
  geom_hline(yintercept = avg_line[[category]], color = "white", alpha = 0.6, size = 1) +
  theme(legend.position = "none",
      panel.background = element_rect(color = "#242B47", fill = "#242B47"),
      plot.background = element_rect(color = "#242B47", fill = "#242B47"),
      axis.line = element_line(color = "grey48", size = 0.05, linetype = "dotted"),
      axis.text = element_text(family = "Georgia", color = "white"),
      axis.text.x = element_text(angle = 90),
      # Get rid of the y- and x-axis titles
      axis.title.y=element_blank(),
      axis.title.x=element_blank(),
      panel.grid.major.y = element_line(color = "grey48", size = 0.05),
      panel.grid.minor.y = element_blank(),
      panel.grid.major.x = element_blank())
}

grid.arrange(grobs = p, nrow = 1)

我想要的輸出是這樣的：

Answer 1

for循環中的問題是語句filter(category == category) 。 總是如此，因為這兩次都從數據中提取category 。 如果您確實需要for循環，只需在for循環中重命名迭代器即可。

但是，您根本不需要grid.arrange 。 facet_wrap會為您提供所需的確切信息（您可能需要對facet標簽進行一些重新格式化，這些操作使用以strip開頭的主題元素進行控制）：

category_means <- test %>% 
  group_by(category) %>%
  summarize_at(vars(salary), mean)

p <- test %>%
  # group_by(category) %>%
  ggplot(aes(x = title, y = salary, color = sex)) + 
  facet_wrap(~ category, nrow = 1, scales = "free_y") +  
  geom_line(color = 'white') + 
  geom_point() + 
  scale_color_manual(values = c("#F49171", "#81C19C")) +
  geom_hline(data = category_means, aes(yintercept = salary), color = 'white', alpha = 0.6, size = 1) + 
  theme(legend.position = "none",
    panel.background = element_rect(color = "#242B47", fill = "#242B47"),
    plot.background = element_rect(color = "#242B47", fill = "#242B47"),
    axis.line = element_line(color = "grey48", size = 0.05, linetype = "dotted"),
    axis.text = element_text(family = "Georgia", color = "white"),
    axis.text.x = element_text(angle = 90),
    # Get rid of the y- and x-axis titles
    axis.title.y=element_blank(),
    axis.title.x=element_blank(),
    panel.grid.major.y = element_line(color = "grey48", size = 0.05),
    panel.grid.minor.y = element_blank(),
    panel.grid.major.x = element_blank())
p

使用dplyr的for循環匯總返回的結果與group_by不同

問題描述

1 個解決方案

解決方案1
1 已采納 2018-06-22 01:21:33

使用dplyr的for循環匯總返回的結果與group_by不同

問題描述

1 個解決方案

解決方案1 1 已采納 2018-06-22 01:21:33

解決方案1
1 已采納 2018-06-22 01:21:33