简体   繁体   English

R中重复行中的最大值

[英]Max among duplicated rows in R

I have a data set including several variables such as ID,Time,Age, v1,v 2, v3. 我有一个包含几个变量的数据集,例如ID,Time,Age,v1,v 2,v3。 I need to replace the duplicated rows (condition on ID and time); 我需要替换重复的行(ID和时间的条件); for the rows which have the same ID and time, get the max for each variable and replace it in the data set (I need to keep all the duplicated rows). 对于具有相同ID和时间的行,获取每个变量的最大值,并将其替换为数据集(我需要保留所有重复的行)。 Any advice would be appreciated. 任何意见,将不胜感激。

Using dplyr 使用dplyr

library(dplyr)
your_data %>%
  group_by(ID, time) %>%
  mutate_each(funs = funs(max)))

If you have NA values, try 如果您有NA值,请尝试

your_data %>%
  group_by(ID, time) %>%
  mutate_each(funs = funs(max(., na.rm = T)))

Happy to test/demo on any data provided. 很高兴对提供的任何数据进行测试/演示。

My example has only v1 and v2, but you get the idea... 我的示例只有v1和v2,但您知道了...

> head(d)
  ID Time Age v1 v2
1  a    1  11 12 13
2  a    1  21 12 53
3  a    1  11 42 43
4  b    1   4  6  7
5  b    2   1  2  3

Here we aggregate to get the max 在这里我们合计以获得最大

> agg = aggregate(
          list(Age=d$Age,v1=d$v1,v2=d$v2),
          by=list(ID=d$ID,Time=d$Time),
          FUN=max
        )

> head(agg)
  ID Time Age v1 v2
1  a    1  21 42 53
2  b    1   4  6  7
3  b    2   1  2  3

Now we merge that with the first two columns of our original data 现在,我们将其与原始数据的前两列合并

> merge(d[,c(1,2)],agg,by=c("ID","Time"))
  ID Time Age v1 v2
1  a    1  21 42 53
2  a    1  21 42 53
3  a    1  21 42 53
4  b    1   4  6  7
5  b    2   1  2  3

Another alternative using ave , as applied to @Larsenal's example data: 另一种使用ave替代方法,应用于@Larsenal的示例数据:

idvars <- c("ID","Time")
numvars <- setdiff(names(dat), idvars)
dat[numvars] <- lapply(dat[numvars], function(x) ave(x, dat[idvars], FUN=max))

#  ID Time Age v1 v2
#1  a    1  21 42 53
#2  a    1  21 42 53
#3  a    1  21 42 53
#4  b    1   4  6  7
#5  b    2   1  2  3

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM