用 dplyr 總結 - 一個變量總是在底部

Question

誰能幫我這個？ 我對來自多家公司的支出數據進行分組和匯總，輸出如下所示：

df <- data.frame(
    Column1 = c("Other", "Brand1", "Brand2", "Brand3", "Brand4", "Brand5"),
    Column2 = c(NA, "Subbrand1", "Subbrand2", "Subbrand3", "Subbrand4", "Subbrand5"),
    Spendings = c(1000, 500, 250, 200, 150, 100)
)

  Column1   Column2 Spendings
1   Other      <NA>      1000
2  Brand1 Subbrand1       500
3  Brand2 Subbrand2       250
4  Brand3 Subbrand3       200
5  Brand4 Subbrand4       150
6  Brand5 Subbrand5       100

“其他”行在頂部，但是我希望該特定列位於底部，因為后來的可視化（如這里）

df <- data.frame(
    Column1 = c("Brand1", "Brand2", "Brand3", "Brand4", "Brand5", "Other"),
    Column2 = c("Subbrand1", "Subbrand2", "Subbrand3", "Subbrand4", "Subbrand5", NA),
    Spendings = c(500, 250, 200, 150, 100, 1000)
)

  Column1   Column2 Spendings
1  Brand1 Subbrand1       500
2  Brand2 Subbrand2       250
3  Brand3 Subbrand3       200
4  Brand4 Subbrand4       150
5  Brand5 Subbrand5       100
6   Other      <NA>      1000

這是我用來創建 df 的函數，其中包含一些我想要的代碼，obv。 不起作用:-(。

df <- df%>%
    group_by(Column1, Column2) %>%
    summarise(Spendings = sum(Spendings)) %>%
    arrange(desc(Spendings), lastrow = "others")

有沒有辦法在 dplyr 工作流程中獲得底部的“其他”行？ 子集和 rbinding 當然是可能的，但有沒有更適合的方法？

Answer 1

我們可以在arrange上使用邏輯向量，這將導致基於字母順序的排序，即FALSE在TRUE之前

df %>% 
   arrange(Column1 == "Other")
#  Column1   Column2 Spendings
#1  Brand1 Subbrand1       500
#2  Brand2 Subbrand2       250
#3  Brand3 Subbrand3       200
#4  Brand4 Subbrand4       150
#5  Brand5 Subbrand5       100
#6   Other      <NA>      1000

另一種選擇是將列創建為具有按該順序指定的levels factor ，以便“其他”是最后一個level ，如果我們arrange它將根據levels進行排序。 這可能是一個更好的選擇，因為它也將在進行plot保持不變

un1 <- c(setdiff(unique(df$Column1), "Other"), "Other")
df %>%
    mutate(Column1 = factor(Column1, levels = un1)) %>%
    arrange(Column1)

如果我們使用forcats包，有一些有用的函數fct_relevel可以輕松修改levels

library(forcats)
df %>% 
  mutate(Column1 = fct_relevel(Column1, "Other", after = Inf)) %>% 
  arrange(Column1)

根據?fct_relevel的例子

當級別數未知或可變（例如矢量化操作）時，使用“Inf”允許您重新調整到最后

Answer 2

df <- df%>%
group_by(Column1, Column2) %>%
summarise(Spendings = sum(Spendings)) %>%
arrange(Column1=="Other", desc(Spendings))

用 dplyr 總結 - 一個變量總是在底部

問題描述

2 個解決方案

解決方案1
2 已采納 2019-04-24 15:22:52

解決方案2
0 2019-04-24 15:25:58

用 dplyr 總結 - 一個變量總是在底部

問題描述

2 個解決方案

解決方案1 2 已采納 2019-04-24 15:22:52

解決方案2 0 2019-04-24 15:25:58

解決方案1
2 已采納 2019-04-24 15:22:52

解決方案2
0 2019-04-24 15:25:58