在另一列中找到的汇总列的R总和

Question

Given this data, the first 4 columns (rowid, order, line, special), I need to create a column, numSpecial as such: 给定此数据的前4列（rowid，order，line，special），我需要像这样创建一列numSpecial：

rowid   order    line    special    numSpecial
1       A        01      X          1
2       B        01                 0
3       B        02      X          2
4       B        03      X          2
5       C        01      X          1
6       C        02                 0

Where numSpecial is determined by summing the number of times for each order that is special (value = X), given that order-line is special itself, otherwise its 0. 其中，numSpecial是通过对每个特殊订单的次数（值= X）求和来确定的，假设订单行本身是特殊的，否则为0。

I first tried adding a column that simply concats 'order' with 'X', call it orderX, and would look like: 我首先尝试添加一个简单地将“ order”与“ X”连接起来的列，将其称为orderX，看起来像：

orderX
AX
BX
BX
BX
CX
CX

Then do a sum of order & special in orderx: 然后对orderx进行一笔订单和特殊订单的总和：

df$numSpecial <- sum(paste(order, special, sep = "") %in% orderx)

But that doesnt work, it returns the sum of the results for all rows for every order: 但这不起作用，它为每个订单返回所有行的结果总和：

numSpecial
4
4
4
4
4
4

I then tried as.data.table, but I'm not getting the expected results using: 然后，我尝试使用as.data.table，但是使用以下命令却无法获得预期的结果：

as.data.table(mydf)[, numSpecial := sum(paste(order, special, sep = "") %in% orderx), by = rowid]

However that is returning just 1 for each row and not sums: 但是，每一行只返回1，而不是相加：

numSpecial
1
0
1
1
1
0

Where am I going wrong with these? 这些我哪里出错了？ I shouldn't have to create that orderX column either I don't think, but I can't figure out the way to get this count right. 我不认为也不必创建该orderX列，但我无法弄清楚正确计算该计数的方法。 It's similar to a countif in excel which is easy to do. 它类似于易于执行的excel中的countif。

Answer 1

There's probably several ways, but you could just multiply it by a TRUE/FALSE flag of "X" being present: 可能有几种方法，但是您可以将其乘以显示为"X"的TRUE / FALSE标志：

dat[, numSpecial := sum(special == "X") * (special == "X"), by=order]
dat

#   rowid order line special numSpecial
#1:     1     A    1       X          1
#2:     2     B    1                  0
#3:     3     B    2       X          2
#4:     4     B    3       X          2
#5:     5     C    1       X          1
#6:     6     C    2                  0

You could also do it a bit differently like: 您也可以这样做：

dat[, numSpecial := 0L][special == "X", numSpecial := .N, by=order]

Where dat was: dat所在的位置：

library(data.table)
dat <- structure(list(rowid = 1:6, order = c("A", "B", "B", "B", "C", 
"C"), line = c(1L, 1L, 2L, 3L, 1L, 2L), special = c("X", "", 
"X", "X", "X", "")), .Names = c("rowid", "order", "line", "special"
), row.names = c(NA, -6L), class = "data.frame")
setDT(dat)

Answer 2

You could use ave with a dummy variable (just filled with 1 s): 您可以将ave与一个虚拟变量（仅填充1 s）一起使用：

df$numSpecial <- ifelse(df$special == "X", ave(rep(1,nrow(df)), df$order, df$special, FUN = length), 0)

 df
#  rowid order line special numSpecial
#1     1     A    1       X          1
#2     2     B    1                  0
#3     3     B    2       X          2
#4     4     B    3       X          2
#5     5     C    1       X          1
#6     6     C    2                  0

Note I read in your data without the numSpecial column. 请注意，我读入的数据没有numSpecial列。

Answer 3

Using the dplyr package: 使用dplyr软件包：

library(dplyr)

df %>% group_by(order) %>% 
  mutate(numSpecial = ifelse(special=="X", sum(special=="X"), 0))

  rowid order special numSpecial 1 1 AX 1 2 2 B 0 3 3 BX 2 4 4 BX 2 5 5 CX 1 6 6 C 0

Answer 4

One other option using base R only would be to use aggregate: 仅使用基数R的另一种选择是使用聚合：

# Your data
df <- data.frame(rowid = 1:6, order = c("A", "B", "B", "B", "C", "C"), special = c("X", "", "X", "X", "X", ""))

# Make the counts    
dat <- with(df,aggregate(x=list(answer=special),by=list(order=order,special=special),FUN=function(x) sum(x=="X")))

# Merge back to original dataset:
dat.fin <- merge(df,dat,by=c('order','special'))

在另一列中找到的汇总列的R总和

问题描述

4 个解决方案

解决方案1
2 2017-11-20 21:55:29

解决方案2
1 已采纳 2017-11-20 21:52:36

解决方案3
1 2017-11-20 22:14:02

解决方案4
0 2017-11-20 21:45:09

在另一列中找到的汇总列的R总和

问题描述

4 个解决方案

解决方案1 2 2017-11-20 21:55:29

解决方案2 1 已采纳 2017-11-20 21:52:36

解决方案3 1 2017-11-20 22:14:02

解决方案4 0 2017-11-20 21:45:09

解决方案1
2 2017-11-20 21:55:29

解决方案2
1 已采纳 2017-11-20 21:52:36

解决方案3
1 2017-11-20 22:14:02

解决方案4
0 2017-11-20 21:45:09