[英]Summarise with dplyr - One variable always on bottom
Can anyone help me with this?谁能帮我这个? I grouped and summarised spendings data from multiple companies, the output looks like this:
我对来自多家公司的支出数据进行分组和汇总,输出如下所示:
df <- data.frame(
Column1 = c("Other", "Brand1", "Brand2", "Brand3", "Brand4", "Brand5"),
Column2 = c(NA, "Subbrand1", "Subbrand2", "Subbrand3", "Subbrand4", "Subbrand5"),
Spendings = c(1000, 500, 250, 200, 150, 100)
)
Column1 Column2 Spendings
1 Other <NA> 1000
2 Brand1 Subbrand1 500
3 Brand2 Subbrand2 250
4 Brand3 Subbrand3 200
5 Brand4 Subbrand4 150
6 Brand5 Subbrand5 100
The "others" row is on top, however I want that specific column on the bottom, because of later visualization (like here) “其他”行在顶部,但是我希望该特定列位于底部,因为后来的可视化(如这里)
df <- data.frame(
Column1 = c("Brand1", "Brand2", "Brand3", "Brand4", "Brand5", "Other"),
Column2 = c("Subbrand1", "Subbrand2", "Subbrand3", "Subbrand4", "Subbrand5", NA),
Spendings = c(500, 250, 200, 150, 100, 1000)
)
Column1 Column2 Spendings
1 Brand1 Subbrand1 500
2 Brand2 Subbrand2 250
3 Brand3 Subbrand3 200
4 Brand4 Subbrand4 150
5 Brand5 Subbrand5 100
6 Other <NA> 1000
This is the function I used to create the df with some desired code of me, which obv.这是我用来创建 df 的函数,其中包含一些我想要的代码,obv。 does not work :-(.
不起作用:-(。
df <- df%>%
group_by(Column1, Column2) %>%
summarise(Spendings = sum(Spendings)) %>%
arrange(desc(Spendings), lastrow = "others")
Is there a way to get the "others" row on bottom inside the dplyr workflow?有没有办法在 dplyr 工作流程中获得底部的“其他”行? Subsetting and rbinding is of course possible, but is there a way which suits better?
子集和 rbinding 当然是可能的,但有没有更适合的方法?
We can use a logical vector on arrange
and this would result in ordering based on alphabetical order ie FALSE
comes before TRUE
我们可以在
arrange
上使用逻辑向量,这将导致基于字母顺序的排序,即FALSE
在TRUE
之前
df %>%
arrange(Column1 == "Other")
# Column1 Column2 Spendings
#1 Brand1 Subbrand1 500
#2 Brand2 Subbrand2 250
#3 Brand3 Subbrand3 200
#4 Brand4 Subbrand4 150
#5 Brand5 Subbrand5 100
#6 Other <NA> 1000
Another option is to create the column as factor
with levels
specified in that order so that 'Other' is the last level
and if we arrange
it would be do the order based on the levels
.另一种选择是将列创建为具有按该顺序指定的
levels
factor
,以便“其他”是最后一个level
,如果我们arrange
它将根据levels
进行排序。 It might be a better option as it would also be maintained while doing the plot
这可能是一个更好的选择,因为它也将在进行
plot
保持不变
un1 <- c(setdiff(unique(df$Column1), "Other"), "Other")
df %>%
mutate(Column1 = factor(Column1, levels = un1)) %>%
arrange(Column1)
if we use the forcats
package, there are some useful functions fct_relevel
to modify the levels
easily如果我们使用
forcats
包,有一些有用的函数fct_relevel
可以轻松修改levels
library(forcats)
df %>%
mutate(Column1 = fct_relevel(Column1, "Other", after = Inf)) %>%
arrange(Column1)
According to the examples in ?fct_relevel
根据
?fct_relevel
的例子
Using 'Inf' allows you to relevel to the end when the number of levels is unknown or variable (eg vectorised operations)
当级别数未知或可变(例如矢量化操作)时,使用“Inf”允许您重新调整到最后
df <- df%>%
group_by(Column1, Column2) %>%
summarise(Spendings = sum(Spendings)) %>%
arrange(Column1=="Other", desc(Spendings))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.