[英]Replacing values in one column after matching values from another column
My data looks something like this. 我的数据看起来像这样。 What i want to do now is replace the "Old ID" values by using matching values from the second table: First table is this, 我现在想做的是使用第二个表中的匹配值替换“旧ID”值:第一个表是这个,
Old ID | Usage
211 25
211 17
211 18
202 11
202 12
194 17
202 16
194 22
194 84
198 26
The second table with the matching values 第二个表具有匹配值
Old ID | ID
211 abf
202 rdg
194 ufe
198
The first table should be changed after replacing each value in the Old ID with the corresponding values in the second table. 将旧ID中的每个值替换为第二个表中的相应值后,应更改第一个表。 If the value in the ID column is missing or "NULL" then the replaced value in the first table should show as "N/A" The first table should now look like this, 如果ID列中的值丢失或为“ NULL”,则第一个表中的替换值应显示为“ N / A”。第一个表现在应如下所示,
Old ID | Usage
abf 25
abf 17
abf 18
rdg 11
rdg 12
ufe 17
rdg 16
ufe 22
ufe 84
n/a 26
I have around 2 million such entries. 我有大约200万个这样的条目。 Thanks a lot for you help 非常感谢您的帮助
Something like this? 像这样吗
df1 <- data.frame(old.id = c(211, 211, 211, 202, 194, 202, 198, 194), usage=c(20:27), stringsAsFactors = F)
df2 <- data.frame(old.id = c(211, 211, 212, 213, 202, 198), ID = c("a", "a", "b", "c", "d", "e"), stringsAsFactors = F)
df1$old.id <- sapply(df1$old.id , (function(nm) { out <- df2[df2$old.id == nm, ]$ID; ifelse(length(out) > 0, out[1], NA) }))
df1
first merge the two tables then remove the duplicates as below: 首先合并两个表,然后删除重复项,如下所示:
S=merge(df1,df2,by="Old_ID")
S[!duplicated(S),c(3,2)]
ID Usage
1 ufe 17
4 ufe 22
7 ufe 84
10 <NA> 26
11 rdg 11
14 rdg 12
17 rdg 16
20 abf 25
23 abf 17
26 abf 18
This can be solved with an update on join : 这可以通过join的更新来解决:
library(data.table)
setDT(DT1)[setDT(DT2), on = "Old_ID", Old_ID := ID][]
Old_ID Usage 1: abf 25 2: abf 17 3: abf 18 4: rdg 11 5: rdg 12 6: ufe 17 7: rdg 16 8: ufe 22 9: ufe 84 10: NA 26
DT1 <- structure(list(Old_ID = c("abf", "abf", "abf", "rdg", "rdg",
"ufe", "rdg", "ufe", "ufe", NA), Usage = c("25", "17", "18",
"11", "12", "17", "16", "22", "84", "26")), .Names = c("Old_ID",
"Usage"), row.names = c(NA, -10L), class = c("data.table", "data.frame"))
DT2 <- structure(list(Old_ID = c("211", "202", "194", "198"), ID = c("abf",
"rdg", "ufe", NA)), .Names = c("Old_ID", "ID"), row.names = c(NA,
-4L), class = c("data.table", "data.frame"))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.