简体   繁体   English

根据r中的ID从一列中查找另一列中的值

[英]Find values from one column in another column according to ID in r

I have a data frame with multiple entries for each ID.我有一个数据框,每个 ID 都有多个条目。 An ID has a reference number (NEW_REF) and an old reference number (OLD_REF). ID 有一个参考号 (NEW_REF) 和一个旧参考号 (OLD_REF)。 I need to find the most recent reference number for each ID, meaning the reference number that is not in the Old reference number column.我需要为每个 ID 找到最新的参考号,这意味着旧参考号列中没有的参考号。

ID <- c(1,2,3,4,1,3,5,2,4,1,3,4)     
NEW_REF <- c("TS101","TS253","TS565","TS789","TD123","TS101","TD367","TH152","TD123","TF908","TD256","TS898")
OLD_REF <- c("TD123","TH152","TS101","TD123","TF908","TD256","TG232","TR142","TS898","TR268","TB496","TD969")
DF <- data.frame(ID,NEW_REF ,OLD_REF )

DF$Active_ind <- NA
DF$Active_ind[which(DF$NEW_REF %in% DF$OLD_REF )] <-"N"    #if a reference number is in the old reference number column it is not active or not the most recent
DF$Active_ind[which(!(DF$NEW_REF %in% DF$OLD_REF ))] <-"Y"   #if a reference number is not in the old reference number column it is active or the most recent

    ID NEW_REF OLD_REF Active_ind
1   1   TS101   TD123          N
2   2   TS253   TH152          Y
3   3   TS565   TS101          Y
4   4   TS789   TD123          Y
5   1   TD123   TF908          N
6   3   TS101   TD256          N
7   5   TD367   TG232          Y
8   2   TH152   TR142          N
9   4   TD123   TS898          N
10  1   TF908   TR268          N
11  3   TD256   TB496          N
12  4   TS898   TD969          N

My problem is that ID 1 has a new reference TS101 (row 1) and ID 3 has an old reference TS101 (row 3).我的问题是 ID 1 有一个新的引用 TS101(第 1 行),而 ID 3 有一个旧的引用 TS101(第 3 行)。 How do I check which reference number is most recent per ID if the reference numbers are not unique.如果参考编号不唯一,我如何检查每个 ID哪个参考编号是最新的。

I would like Row 1 to have a Y in the Active_ind column:我希望第 1 行的 Active_ind 列中有一个 Y:

    ID NEW_REF OLD_REF Active_ind
1   1   TS101   TD123          Y
2   2   TS253   TH152          Y
3   3   TS565   TS101          Y
4   4   TS789   TD123          Y
5   1   TD123   TF908          N
6   3   TS101   TD256          N
7   5   TD367   TG232          Y
8   2   TH152   TR142          N
9   4   TD123   TS898          N
10  1   TF908   TR268          N
11  3   TD256   TB496          N
12  4   TS898   TD969          N

I know it is possible with a for loop, but I would like to avoid it as my data set has over 40 000 different IDs and becomes very time intensive when a loop is used.我知道 for 循环是可能的,但我想避免它,因为我的数据集有超过 40 000 个不同的 ID,并且在使用循环时变得非常耗时。

We can use dplyr to group them by ID and then check if the values in NEW_REF is present in OLD_REF and give them the values accordingly.我们可以使用dplyrID对它们进行分组,然后检查OLD_REF NEW_REF ,并相应地为它们提供值。

library(dplyr)
DF %>%
   group_by(ID) %>%
   mutate(Active_Ind = ifelse(NEW_REF %in% OLD_REF, "N", "Y"))


#     ID NEW_REF OLD_REF Active_Ind
#   <dbl>  <fctr>  <fctr>      <chr>
#      1   TS101   TD123          Y
#      2   TS253   TH152          Y
#      3   TS565   TS101          Y
#      4   TS789   TD123          Y
#      1   TD123   TF908          N
#      3   TS101   TD256          N
#      5   TD367   TG232          Y
#      2   TH152   TR142          N
#      4   TD123   TS898          N
#      1   TF908   TR268          N
#      3   TD256   TB496          N
#      4   TS898   TD969          N

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 R:根据另一列中的值计算一列中的值的数量 - R: Count number of values from one column according to values in another column R-根据另一列中的值找到时间戳记的开始和结束 - R - Find the begin and the end of timestamp according with values in another column (R) 如何根据 R 中的另一列和 ID 从一列复制粘贴值 - (R) How to copy paste values from one column based on another column and ID in R 根据 R 中另一列的值为列分配随机值 - Assign random values to column according to another column's values in R R根据其参考列将特定列从一个数据帧合并到另一数据帧 - R merge a particular column from one data frame to another according to its reference column 根据另一列中的值对一列中的特定级别进行排序 - Sort out specific levels in one column according to the values in another column 根据另一列中的行信息替换一列中的不同值 - Replace different values in one column, according to the row information in another column R 中,如何向数据集添加一列,该数据集从一列中添加值并从另一列中减去值? - How to add a column to a dataset which adds values from one column and subtracts values from another column in R? 使用 dataframe 作为键来根据 R Studio 中的一列值填充另一个 - Use a dataframe as a key to fill another according to one column values in R Studio 根据 R 中另一列中的 ID 分配一列中的 ID - Assign an ID in one column based on the ID in another column in R
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM