简体   繁体   English

在R data.table中基于数字变量的max()的残留字符串值

[英]Carryover string value based on max() of numeric variable by group in R data.table

Is there a short and sweet code to Carryover string value of a desired line to the whole group? 是否有一个简短而可爱的代码将所需行的字符串值传递给整个组?

As a reference, for numeric variables I may want to carryover the value of a numeric variable (y) within group to all observations of that group based on the max() value of another variable (x). 作为参考,对于数字变量,我可能希望根据另一个变量(x)的max()值将组内的数字变量(y)的值转移到该组的所有观测值中。 I do that by: 我这样做是:

d <- data.table(id  =c('A','A','A','A','B','B','B','B','B'),
                x =c(10, 1, 4,  NA, NA, NA, NA, 9 , 23),
                y =c( 7, 6, 23, 1 , 2, NA, NA, 9 , 4),
                char=c('W','X','Y','Z','T',NA, NA, NA, NA))

d[,aux:=(x==max(x,na.rm=T) & !is.na(x)),by=id]
d[,aux2:=y*aux,by=id]
d[,y_carry_max:=max(aux2,na.rm=T),by=id]

What if instead of carryin a numeric value, I want to carry the value of a string variable ( char ), based on aux (which marks the maximum value of x ) 如果我想携带一个基于aux的字符串变量( char )的值(它表示x的最大值),而不是携带一个数值该怎么办?

I suppose this would be an intermediate step 我想这将是一个中间步骤

d[aux==T,char_aux:=char,by=id]

How can I carryover the value of char_aux across the other lines of each gruop to create the variable char_carry_max ? 如何将char_aux的值char_aux在每个组的其他行上以创建变量char_carry_max

EDIT1: the desired output is the last column: EDIT1:所需的输出是最后一列:

   id  x  y char   aux aux2 y_carry_max char_aux char_carry_max
1:  A 10  7    W  TRUE    7           7        W              W
2:  A  1  6    X FALSE    0           7       NA              W
3:  A  4 23    Y FALSE    0           7       NA              W
4:  A NA  1    Z FALSE    0           7       NA              W
5:  B NA  2    T FALSE    0           4       NA              P
6:  B NA NA   NA FALSE   NA           4       NA              P
7:  B NA NA   NA FALSE   NA           4       NA              P
8:  B  9  9   NA FALSE    0           4       NA              P
9:  B 23  4    P  TRUE    4           4        P              P

Edit2: regarding @AdagioMolto comment: "Does each value in x correspond to a unique value in char? What if two or more rows feature x == max(x)? Which char should be taken?" Edit2:关于@AdagioMolto注释:“ x中的每个值都对应于char中的唯一值吗?如果两行或更多行具有x == max(x),该怎么办?应采用哪个char?”

Good question. 好问题。 Assume they are unique for the purpose of this question. 假设对于这个问题,它们是唯一的。 What I do in practice is to add a random perturbation of smaller order of magnitude to break ties. 我在实践中要做的是添加一个较小量级的随机扰动来打破平局。 In the example above it whould be: d[,x:=x+ (runif(.N)/1000)] 在上面的示例中,谁应该是: d[,x:=x+ (runif(.N)/1000)]

Edit3: besides the nice dplyr answer bellow, is the a more native data.table way of doing this? Edit3:除了不错的dplyr答案之外,还有更本地的data.table方法吗?

With dplyr (and comments from @Frank and @LucasMation) : 使用dplyr(以及@Frank和@LucasMation的评论):

d %>% group_by(id) %>% 
mutate(char_carry_max = char[which.max(x)], y_carry_max = y[which.max(x)]) %>% 
data.table()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM