简体   繁体   English

用 dplyr 总结 - 一个变量总是在底部

[英]Summarise with dplyr - One variable always on bottom

Can anyone help me with this?谁能帮我这个? I grouped and summarised spendings data from multiple companies, the output looks like this:我对来自多家公司的支出数据进行分组和汇总,输出如下所示:

df <- data.frame(
    Column1 = c("Other", "Brand1", "Brand2", "Brand3", "Brand4", "Brand5"),
    Column2 = c(NA, "Subbrand1", "Subbrand2", "Subbrand3", "Subbrand4", "Subbrand5"),
    Spendings = c(1000, 500, 250, 200, 150, 100)
)

  Column1   Column2 Spendings
1   Other      <NA>      1000
2  Brand1 Subbrand1       500
3  Brand2 Subbrand2       250
4  Brand3 Subbrand3       200
5  Brand4 Subbrand4       150
6  Brand5 Subbrand5       100

The "others" row is on top, however I want that specific column on the bottom, because of later visualization (like here) “其他”行在顶部,但是我希望该特定列位于底部,因为后来的可视化(如这里)

df <- data.frame(
    Column1 = c("Brand1", "Brand2", "Brand3", "Brand4", "Brand5", "Other"),
    Column2 = c("Subbrand1", "Subbrand2", "Subbrand3", "Subbrand4", "Subbrand5", NA),
    Spendings = c(500, 250, 200, 150, 100, 1000)
)

  Column1   Column2 Spendings
1  Brand1 Subbrand1       500
2  Brand2 Subbrand2       250
3  Brand3 Subbrand3       200
4  Brand4 Subbrand4       150
5  Brand5 Subbrand5       100
6   Other      <NA>      1000

This is the function I used to create the df with some desired code of me, which obv.这是我用来创建 df 的函数,其中包含一些我想要的代码,obv。 does not work :-(.不起作用:-(。

df <- df%>%
    group_by(Column1, Column2) %>%
    summarise(Spendings = sum(Spendings)) %>%
    arrange(desc(Spendings), lastrow = "others")

Is there a way to get the "others" row on bottom inside the dplyr workflow?有没有办法在 dplyr 工作流程中获得底部的“其他”行? Subsetting and rbinding is of course possible, but is there a way which suits better?子集和 rbinding 当然是可能的,但有没有更适合的方法?

We can use a logical vector on arrange and this would result in ordering based on alphabetical order ie FALSE comes before TRUE我们可以在arrange上使用逻辑向量,这将导致基于字母顺序的排序,即FALSETRUE之前

df %>% 
   arrange(Column1 == "Other")
#  Column1   Column2 Spendings
#1  Brand1 Subbrand1       500
#2  Brand2 Subbrand2       250
#3  Brand3 Subbrand3       200
#4  Brand4 Subbrand4       150
#5  Brand5 Subbrand5       100
#6   Other      <NA>      1000

Another option is to create the column as factor with levels specified in that order so that 'Other' is the last level and if we arrange it would be do the order based on the levels .另一种选择是将列创建为具有按该顺序指定的levels factor ,以便“其他”是最后一个level ,如果我们arrange它将根据levels进行排序。 It might be a better option as it would also be maintained while doing the plot这可能是一个更好的选择,因为它也将在进行plot保持不变

un1 <- c(setdiff(unique(df$Column1), "Other"), "Other")
df %>%
    mutate(Column1 = factor(Column1, levels = un1)) %>%
    arrange(Column1)

if we use the forcats package, there are some useful functions fct_relevel to modify the levels easily如果我们使用forcats包,有一些有用的函数fct_relevel可以轻松修改levels

library(forcats)
df %>% 
  mutate(Column1 = fct_relevel(Column1, "Other", after = Inf)) %>% 
  arrange(Column1)

According to the examples in ?fct_relevel根据?fct_relevel的例子

Using 'Inf' allows you to relevel to the end when the number of levels is unknown or variable (eg vectorised operations)当级别数未知或可变(例如矢量化操作)时,使用“Inf”允许您重新调整到最后

df <- df%>%
group_by(Column1, Column2) %>%
summarise(Spendings = sum(Spendings)) %>%
arrange(Column1=="Other", desc(Spendings))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM