简体   繁体   English

在另一列中找到的汇总列的R总和

[英]R sum of aggregate columns found in another column

Given this data, the first 4 columns (rowid, order, line, special), I need to create a column, numSpecial as such: 给定此数据的前4列(rowid,order,line,special),我需要像这样创建一列numSpecial:

rowid   order    line    special    numSpecial
1       A        01      X          1
2       B        01                 0
3       B        02      X          2
4       B        03      X          2
5       C        01      X          1
6       C        02                 0

Where numSpecial is determined by summing the number of times for each order that is special (value = X), given that order-line is special itself, otherwise its 0. 其中,numSpecial是通过对每个特殊订单的次数(值= X)求和来确定的,假设订单行本身是特殊的,否则为0。

I first tried adding a column that simply concats 'order' with 'X', call it orderX, and would look like: 我首先尝试添加一个简单地将“ order”与“ X”连接起来的列,将其称为orderX,看起来像:

orderX
AX
BX
BX
BX
CX
CX

Then do a sum of order & special in orderx: 然后对orderx进行一笔订单和特殊订单的总和:

df$numSpecial <- sum(paste(order, special, sep = "") %in% orderx)

But that doesnt work, it returns the sum of the results for all rows for every order: 但这不起作用,它为每个订单返回所有行的结果总和:

numSpecial
4
4
4
4
4
4

I then tried as.data.table, but I'm not getting the expected results using: 然后,我尝试使用as.data.table,但是使用以下命令却无法获得预期的结果:

as.data.table(mydf)[, numSpecial := sum(paste(order, special, sep = "") %in% orderx), by = rowid]

However that is returning just 1 for each row and not sums: 但是,每一行只返回1,而不是相加:

numSpecial
1
0
1
1
1
0

Where am I going wrong with these? 这些我哪里出错了? I shouldn't have to create that orderX column either I don't think, but I can't figure out the way to get this count right. 我不认为也不必创建该orderX列,但我无法弄清楚正确计算该计数的方法。 It's similar to a countif in excel which is easy to do. 它类似于易于执行的excel中的countif。

There's probably several ways, but you could just multiply it by a TRUE/FALSE flag of "X" being present: 可能有几种方法,但是您可以将其乘以显示为"X"的TRUE / FALSE标志:

dat[, numSpecial := sum(special == "X") * (special == "X"), by=order]
dat

#   rowid order line special numSpecial
#1:     1     A    1       X          1
#2:     2     B    1                  0
#3:     3     B    2       X          2
#4:     4     B    3       X          2
#5:     5     C    1       X          1
#6:     6     C    2                  0

You could also do it a bit differently like: 您也可以这样做:

dat[, numSpecial := 0L][special == "X", numSpecial := .N, by=order]

Where dat was: dat所在的位置:

library(data.table)
dat <- structure(list(rowid = 1:6, order = c("A", "B", "B", "B", "C", 
"C"), line = c(1L, 1L, 2L, 3L, 1L, 2L), special = c("X", "", 
"X", "X", "X", "")), .Names = c("rowid", "order", "line", "special"
), row.names = c(NA, -6L), class = "data.frame")
setDT(dat)

You could use ave with a dummy variable (just filled with 1 s): 您可以将ave与一个虚拟变量(仅填充1 s)一起使用:

df$numSpecial <- ifelse(df$special == "X", ave(rep(1,nrow(df)), df$order, df$special, FUN = length), 0)

 df
#  rowid order line special numSpecial
#1     1     A    1       X          1
#2     2     B    1                  0
#3     3     B    2       X          2
#4     4     B    3       X          2
#5     5     C    1       X          1
#6     6     C    2                  0

Note I read in your data without the numSpecial column. 请注意,我读入的数据没有numSpecial列。

Using the dplyr package: 使用dplyr软件包:

library(dplyr)

df %>% group_by(order) %>% 
  mutate(numSpecial = ifelse(special=="X", sum(special=="X"), 0))
  rowid order special numSpecial 1 1 AX 1 2 2 B 0 3 3 BX 2 4 4 BX 2 5 5 CX 1 6 6 C 0 

One other option using base R only would be to use aggregate: 仅使用基数R的另一种选择是使用聚合:

# Your data
df <- data.frame(rowid = 1:6, order = c("A", "B", "B", "B", "C", "C"), special = c("X", "", "X", "X", "X", ""))

# Make the counts    
dat <- with(df,aggregate(x=list(answer=special),by=list(order=order,special=special),FUN=function(x) sum(x=="X")))

# Merge back to original dataset:
dat.fin <- merge(df,dat,by=c('order','special'))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 通过R中另一列中的值聚合多个列 - Aggregate multiple columns by values in another column in R R:基于单个列聚合(总和)但保留所有其他列? - R: Aggregate (sum) based on a single column but keep all other columns? 根据 R 中其他列中的条件对列进行求和 - Sum column based on condition in another columns in R 基础 R:按两列聚合和求和 - base R: Aggregate and sum by two columns 如何检查两列的总和是否等于r中的另一列? - How to check whether the sum of two columns equals to another column in r? 如何计算 R 中特定列的总和并将结果放在另一列中 - How to calculate the sum of specific columns in R and make the results in a another column 在 dataframe 中创建新列,条件是 R 中另一个列值的总和 - Creating new columns in dataframe conditional on the sum of another column value in R 根据 R 中的另一列有条件地对一系列列进行求和 - Conditionally sum a range of columns based on another column in R R:如果另一列具有不同的值,如何使用aggregate()函数对某一列的数据求和? - R: how to use the aggregate()-function to sum data from one column if another column has a distinct value? 根据另一个字符列(R)中的分类数据,使用聚合对数字变量求和 - Using aggregate to sum a numerical variable, according to categorical data in another character column (R)
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM