简体   繁体   English

合并数据帧并用R中的值替换NONE

[英]Merge data frames and replace NONE with values in R

I have two data.frames: 我有两个data.frames:

data.frame1:
CustID  FirstName   LastName    Address         DOB         City    Phone
132    Mary         K               999 Drive   1/1/2011    Chicago 888-0000
133    Mona         J               222 Road    1/4/2002    NY      999-8888
188    Jack         S               122 Street  9/2/2009    Washin  777-9999
None    Helen       L               111 Rd      1/4/2010        
None    John        M               888 Lane    4/2/2002        
None    Sally       K               222 Street  2/3/2002        


data.frame2                     
CustID FirstName LastName Address   DOB         City
132    Mary      K        999 Drive 1/1/2011    Chicago 
133    Mona      J         222 Road 1/4/2002    NY  
188    Jack      S      122 Street  9/2/2009    Washington  
3338    Helen   L         111 Rd    1/4/2010        
882     John    M       888 Lane    4/2/2002        
976    Sally    K     222 Street    2/3/2002    

Data.frame1 contains None in CustID column. Data.frame1在CustID列中包含“无”。 I need to replace these Nones with a CustID from data.frame2 and make sure that columns FirstName, LastName, Address, DOB match from both data sets, because some names can match from both data sets but have different address and DOB - these are not the same people. 我需要用data.frame2中的CustID替换这些None,并确保两个数据集中的列FirstName,LastName,Address,DOB匹配,因为某些名称可以从两个数据集中匹配,但具有不同的地址和DOB-这些不是同样的人。 I have converted these columns into character from factor (not sure if it matters), and applied match() function but received 0 matches (which i know is wrong) this is my code: 我已经将这些列转换为来自factor的字符(不确定是否重要),并应用了match()函数,但收到了0个匹配项(我知道这是错误的),这是我的代码:

data.frame1$ID[match(c(data.framr2$'FirstName',
                     data.frame2$'LastName',
                     data.frame2$'DOB',
                     data.frame2$'Address'), 
                     c(data.frame1$'FirstName',
                     data.frame1$'LastName',
                     data.frame1$'DOB',
                     data.frame1$'Address'))]   

This code should illustrate how you have to proceed: 此代码应说明您如何进行:

  • merge the data.frames by "fname" and "lname" (consider only the rows where id is missing) 通过“ fname”和“ lname”合并data.frames(仅考虑缺少id的行)
  • select the "id" column of the merged data.frame and copy it to df1 选择合并的data.frame的“ id”列并将其复制到df1

Example

df1 <- data.frame(id=c(NA, 12, NA, 13), 
    fname=c("A","B","Z","D"), 
    lname=c("1","2","3","4"))

df2 <- data.frame(id=c(1, 21, 33, 44), 
    fname=c("Z","A","A","Z")  , 
    lname=c("1","3","1","3"))

df1[!complete.cases(df1),1] <- merge(
    x=df1[!complete.cases(df1[,"id"]),], 
    y=df2, 
    by=c("fname", "lname"))[,"id.y"]

Here is one way using dplyr . 这是使用dplyr一种方法。

  library(dplyr)

  df1 <- read.table(text = 
       "CustID  FirstName   LastName    Address         DOB         City    Phone
  132    Mary         K               999Drive   1/1/2011    Chicago 888-0000
  133    Mona         J               222Road    1/4/2002    NY      999-8888
  188    Jack         S               122Street  9/2/2009    Washin  777-9999
  None    Helen       L               111Rd      1/4/2010     ''     ''
  None    John        M               888Lane    4/2/2002       ''   ''
  None    Sally       K               222Street  2/3/2002        ''  ''"
  , header = T, stringsAsFactors = F)


  df2 <- read.table(text=                    
  "CustID FirstName LastName Address   DOB         City
  132    Mary      K        999Drive 1/1/2011    Chicago 
  133    Mona      J         222Road 1/4/2002    NY  
  188    Jack      S      122Street  9/2/2009    Washington  
  3338    Helen   L         111Rd    1/4/2010     ''   
  882     John    M       888Lane    4/2/2002       '' 
  976    Sally    K     222Street    2/3/2002    ''"
  , header = T, stringsAsFactors = F)

  df1 %>% left_join(df2 %>% select(-City), by = c('FirstName', 'LastName', 'DOB', 'Address')) %>% 
       mutate(CustID = ifelse(CustID.y == "None", CustID.x, CustID.y)) %>% select(-CustID.x, -CustID.y)



        FirstName LastName   Address      DOB    City    Phone CustID
1      Mary        K  999Drive 1/1/2011 Chicago 888-0000    132
2      Mona        J   222Road 1/4/2002      NY 999-8888    133
3      Jack        S 122Street 9/2/2009  Washin 777-9999    188
4     Helen        L     111Rd 1/4/2010                    3338
5      John        M   888Lane 4/2/2002                     882
6     Sally        K 222Street 2/3/2002                     976

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM