[英]Ordering of group_by, mutate and summarize in R
Data frame df
has three columns: x
, y
, and n
.数据框
df
具有三列: x
、 y
和n
。 I want to create a new data frame that groups by x, counts the number of observations in y for that group x, and then sums the values for that group in n.我想创建一个按 x 分组的新数据框,计算该组 x 在 y 中的观察次数,然后将该组的值求和在 n 中。
df <- structure(list(x = c(1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 4, 4, 5,
5, 5), y = c(1, 2, 3, 4, 1, 2, 3, 1, 2, 3, 4, 1, 2, 1, 2, 3),
n = c(4L, 3L, 2L, 3L, 2L, 4L, 2L, 2L, 3L, 3L, 2L, 5L, 3L,
3L, 2L, 3L)), class = "data.frame", row.names = c(NA, -16L))
The target data frame looks like this, where a
are the 5 groups from original df
:目标数据框如下所示,其中
a
是原始df
的 5 个组:
> print(df2, row.names=FALSE)
a b c
1 4 12
2 3 8
3 4 10
4 2 8
5 3 8
For some reason I'm not combining the group_by
or mutate
or summarize
statements in the pipe in the right order to make this happen.出于某种原因,我没有以正确的顺序组合 pipe 中的
group_by
或mutate
或summarize
语句来实现这一点。 It feels like a simple solution I'm not seeing right now.感觉就像我现在没有看到的一个简单的解决方案。 If anyone could help I would appreciate.
如果有人可以提供帮助,我将不胜感激。
Here is a data.table
option这是一个
data.table
选项
> setDT(df)[, .(b = .N, c = sum(n)), x]
x b c
1: 1 4 12
2: 2 3 8
3: 3 4 10
4: 4 2 8
5: 5 3 8
Try this:尝试这个:
library(dplyr)
library(tidyr)
#Code
new <- df %>% group_by(x) %>%
summarise(b=n(),c=sum(n,na.rm=T))
Output: Output:
# A tibble: 5 x 3
x b c
<dbl> <int> <int>
1 1 4 12
2 2 3 8
3 3 4 10
4 4 2 8
5 5 3 8
With base R
, we can do有了
base R
,我们可以做
do.call(rbind, by(df, df$x, FUN = function(x)
data.frame(b = length(x), c = sum(x$n, na.rm = TRUE))))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.