[英]Merge data frames and replace NONE with values in R
I have two data.frames: 我有两个data.frames:
data.frame1:
CustID FirstName LastName Address DOB City Phone
132 Mary K 999 Drive 1/1/2011 Chicago 888-0000
133 Mona J 222 Road 1/4/2002 NY 999-8888
188 Jack S 122 Street 9/2/2009 Washin 777-9999
None Helen L 111 Rd 1/4/2010
None John M 888 Lane 4/2/2002
None Sally K 222 Street 2/3/2002
data.frame2
CustID FirstName LastName Address DOB City
132 Mary K 999 Drive 1/1/2011 Chicago
133 Mona J 222 Road 1/4/2002 NY
188 Jack S 122 Street 9/2/2009 Washington
3338 Helen L 111 Rd 1/4/2010
882 John M 888 Lane 4/2/2002
976 Sally K 222 Street 2/3/2002
Data.frame1 contains None in CustID column. Data.frame1在CustID列中包含“无”。 I need to replace these Nones with a CustID from data.frame2 and make sure that columns FirstName, LastName, Address, DOB match from both data sets, because some names can match from both data sets but have different address and DOB - these are not the same people.
我需要用data.frame2中的CustID替换这些None,并确保两个数据集中的列FirstName,LastName,Address,DOB匹配,因为某些名称可以从两个数据集中匹配,但具有不同的地址和DOB-这些不是同样的人。 I have converted these columns into character from factor (not sure if it matters), and applied match() function but received 0 matches (which i know is wrong) this is my code:
我已经将这些列转换为来自factor的字符(不确定是否重要),并应用了match()函数,但收到了0个匹配项(我知道这是错误的),这是我的代码:
data.frame1$ID[match(c(data.framr2$'FirstName',
data.frame2$'LastName',
data.frame2$'DOB',
data.frame2$'Address'),
c(data.frame1$'FirstName',
data.frame1$'LastName',
data.frame1$'DOB',
data.frame1$'Address'))]
This code should illustrate how you have to proceed: 此代码应说明您如何进行:
Example 例
df1 <- data.frame(id=c(NA, 12, NA, 13),
fname=c("A","B","Z","D"),
lname=c("1","2","3","4"))
df2 <- data.frame(id=c(1, 21, 33, 44),
fname=c("Z","A","A","Z") ,
lname=c("1","3","1","3"))
df1[!complete.cases(df1),1] <- merge(
x=df1[!complete.cases(df1[,"id"]),],
y=df2,
by=c("fname", "lname"))[,"id.y"]
Here is one way using dplyr
. 这是使用
dplyr
一种方法。
library(dplyr)
df1 <- read.table(text =
"CustID FirstName LastName Address DOB City Phone
132 Mary K 999Drive 1/1/2011 Chicago 888-0000
133 Mona J 222Road 1/4/2002 NY 999-8888
188 Jack S 122Street 9/2/2009 Washin 777-9999
None Helen L 111Rd 1/4/2010 '' ''
None John M 888Lane 4/2/2002 '' ''
None Sally K 222Street 2/3/2002 '' ''"
, header = T, stringsAsFactors = F)
df2 <- read.table(text=
"CustID FirstName LastName Address DOB City
132 Mary K 999Drive 1/1/2011 Chicago
133 Mona J 222Road 1/4/2002 NY
188 Jack S 122Street 9/2/2009 Washington
3338 Helen L 111Rd 1/4/2010 ''
882 John M 888Lane 4/2/2002 ''
976 Sally K 222Street 2/3/2002 ''"
, header = T, stringsAsFactors = F)
df1 %>% left_join(df2 %>% select(-City), by = c('FirstName', 'LastName', 'DOB', 'Address')) %>%
mutate(CustID = ifelse(CustID.y == "None", CustID.x, CustID.y)) %>% select(-CustID.x, -CustID.y)
FirstName LastName Address DOB City Phone CustID
1 Mary K 999Drive 1/1/2011 Chicago 888-0000 132
2 Mona J 222Road 1/4/2002 NY 999-8888 133
3 Jack S 122Street 9/2/2009 Washin 777-9999 188
4 Helen L 111Rd 1/4/2010 3338
5 John M 888Lane 4/2/2002 882
6 Sally K 222Street 2/3/2002 976
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.