[英]R sum of aggregate columns found in another column
Given this data, the first 4 columns (rowid, order, line, special), I need to create a column, numSpecial as such: 给定此数据的前4列(rowid,order,line,special),我需要像这样创建一列numSpecial:
rowid order line special numSpecial
1 A 01 X 1
2 B 01 0
3 B 02 X 2
4 B 03 X 2
5 C 01 X 1
6 C 02 0
Where numSpecial is determined by summing the number of times for each order that is special (value = X), given that order-line is special itself, otherwise its 0. 其中,numSpecial是通过对每个特殊订单的次数(值= X)求和来确定的,假设订单行本身是特殊的,否则为0。
I first tried adding a column that simply concats 'order' with 'X', call it orderX, and would look like: 我首先尝试添加一个简单地将“ order”与“ X”连接起来的列,将其称为orderX,看起来像:
orderX
AX
BX
BX
BX
CX
CX
Then do a sum of order & special in orderx: 然后对orderx进行一笔订单和特殊订单的总和:
df$numSpecial <- sum(paste(order, special, sep = "") %in% orderx)
But that doesnt work, it returns the sum of the results for all rows for every order: 但这不起作用,它为每个订单返回所有行的结果总和:
numSpecial
4
4
4
4
4
4
I then tried as.data.table, but I'm not getting the expected results using: 然后,我尝试使用as.data.table,但是使用以下命令却无法获得预期的结果:
as.data.table(mydf)[, numSpecial := sum(paste(order, special, sep = "") %in% orderx), by = rowid]
However that is returning just 1 for each row and not sums: 但是,每一行只返回1,而不是相加:
numSpecial
1
0
1
1
1
0
Where am I going wrong with these? 这些我哪里出错了? I shouldn't have to create that orderX column either I don't think, but I can't figure out the way to get this count right. 我不认为也不必创建该orderX列,但我无法弄清楚正确计算该计数的方法。 It's similar to a countif in excel which is easy to do. 它类似于易于执行的excel中的countif。
There's probably several ways, but you could just multiply it by a TRUE/FALSE flag of "X"
being present: 可能有几种方法,但是您可以将其乘以显示为"X"
的TRUE / FALSE标志:
dat[, numSpecial := sum(special == "X") * (special == "X"), by=order]
dat
# rowid order line special numSpecial
#1: 1 A 1 X 1
#2: 2 B 1 0
#3: 3 B 2 X 2
#4: 4 B 3 X 2
#5: 5 C 1 X 1
#6: 6 C 2 0
You could also do it a bit differently like: 您也可以这样做:
dat[, numSpecial := 0L][special == "X", numSpecial := .N, by=order]
Where dat
was: dat
所在的位置:
library(data.table)
dat <- structure(list(rowid = 1:6, order = c("A", "B", "B", "B", "C",
"C"), line = c(1L, 1L, 2L, 3L, 1L, 2L), special = c("X", "",
"X", "X", "X", "")), .Names = c("rowid", "order", "line", "special"
), row.names = c(NA, -6L), class = "data.frame")
setDT(dat)
You could use ave
with a dummy variable (just filled with 1
s): 您可以将ave
与一个虚拟变量(仅填充1
s)一起使用:
df$numSpecial <- ifelse(df$special == "X", ave(rep(1,nrow(df)), df$order, df$special, FUN = length), 0)
df
# rowid order line special numSpecial
#1 1 A 1 X 1
#2 2 B 1 0
#3 3 B 2 X 2
#4 4 B 3 X 2
#5 5 C 1 X 1
#6 6 C 2 0
Note I read in your data without the numSpecial
column. 请注意,我读入的数据没有numSpecial
列。
Using the dplyr
package: 使用dplyr
软件包:
library(dplyr)
df %>% group_by(order) %>%
mutate(numSpecial = ifelse(special=="X", sum(special=="X"), 0))
rowid order special numSpecial 1 1 AX 1 2 2 B 0 3 3 BX 2 4 4 BX 2 5 5 CX 1 6 6 C 0
One other option using base R only would be to use aggregate: 仅使用基数R的另一种选择是使用聚合:
# Your data
df <- data.frame(rowid = 1:6, order = c("A", "B", "B", "B", "C", "C"), special = c("X", "", "X", "X", "X", ""))
# Make the counts
dat <- with(df,aggregate(x=list(answer=special),by=list(order=order,special=special),FUN=function(x) sum(x=="X")))
# Merge back to original dataset:
dat.fin <- merge(df,dat,by=c('order','special'))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.