繁体   English   中英

如果找到匹配项,则比较两个不同数据框中的列,将电子邮件从df2复制到df1

[英]Compare columns in two different data frames if match found copy email from df2 to df1

我有两个具有不同列名的数据框,每个都有10行。 我正在尝试比较列值,如果它们匹配,则将电子邮件地址从df2复制到df1。 我看了这个示例,但是我的列名不同。 如何连接(合并)数据框(内部,外部,左侧,右侧)? 我也看到了np.where 示例 ,其中使用了多个条件,但是当我这样做时,它给了我以下错误:

ValueError: Wrong number of items passed 2, placement implies 1

我想做的事:

我想做的是比较df1的第一行2列(first,last_huge)与df2列的所有行(first_small,last_small)(如果找到匹配项)从df2中的该特定列获取电子邮件地址并将其分配给df1中的新列。 谁能帮我这个忙,我只复制了下面的相关代码,仅向new_email添加5条新记录就无法完全正常工作。

最初,我将df1 ['first']与df2 ['first']进行了比较

data1 = {"first":["alice", "bob", "carol"],
         "last_huge":["foo", "bar", "baz"],
         "street_huge": ["Jaifo Road", "Wetib Ridge", "Ucagi View"],
         "city_huge": ["Egviniw", "Manbaali", "Ismazdan"],
         "age_huge": ["23", "30", "36"],
         "state_huge": ["MA", "LA", "CA"],
         "zip_huge": ["89899", "78788", "58999"]}

df1 = pd.DataFrame(data1)

data2 = {"first_small":["alice", "bob", "carol"],
         "last_small":["foo", "bar", "baz"],
         "street_small": ["Jsdffo Road", "sdf Ridge", "sdfff View"],
         "city_huge": ["paris", "london", "rome"],
         "age_huge": ["28", "40", "56"],
         "state_huge": ["GA", "EA", "BA"],
         "zip_huge": ["89859", "78728", "56999"],
         "email_small":["alice@xyz.com", "bob@abc.com", "carol@jkl.com"],
         "dob": ["31051989", "31051980", "31051981"],
         "country": ["UK", "US", "IT"],
         "company": ["microsoft", "apple", "google"],
         "source": ["bing", "yahoo", "google"]}

df2 = pd.DataFrame(data2)

df1['new_email'] = np.where((df1[['first']] == df2[['first_small']]), df2[['email_small']], np.nan)

现在它仅向new_email添加5条记录,其余的都是nan。 并显示此错误:

ValueError: Can only compare identically-labeled Series objects

尝试merge

(df1.merge(df2[["first_small", "last_small", "email_small"]], 
           how="left", 
           left_on=["first", "last_huge"], 
           right_on=["first_small", "last_small"])
    .drop(['first_small','last_small'], 1))

例:

data1 = {"first":["alice", "bob", "carol"], 
         "last_huge":["foo", "bar", "baz"]}
df1 = pd.DataFrame(data1)

data2 = {"first_small":["alice", "bob", "carol"], 
         "last_small":["foo", "bar", "baz"],
         "email_small":["alice@xyz.com", "bob@abc.com", "carol@jkl.com"]}
df2 = pd.DataFrame(data2)

(df1.merge(df2[["first_small", "last_small", "email_small"]], 
           how="left", 
           left_on=["first", "last_huge"], 
           right_on=["first_small", "last_small"])
    .drop(['first_small','last_small'], 1))

输出:

   first last_huge    email_small
0  alice       foo  alice@xyz.com
1    bob       bar    bob@abc.com
2  carol       baz  carol@jkl.com

通过使用andrew_reece的示例数据:-) pd.concat

pd.concat([df1.set_index(["first", "last_huge"]),df2.set_index(["first_small", "last_small"])['email_small']],axis=1).reset_index().dropna()
Out[23]: 
   first last_huge    email_small
0  alice       foo  alice@xyz.com
1    bob       bar    bob@abc.com
2  carol       baz  carol@jkl.com

通过使用您的数据

pd.concat([df1.set_index(["first", "last_huge"]),df2.set_index(["first_small", "last_small"])['email_small']],axis=1).reset_index()
Out[97]: 
   first last_huge age_huge city_huge state_huge  street_huge zip_huge  \
0  alice       foo       23   Egviniw         MA   Jaifo Road    89899   
1    bob       bar       30  Manbaali         LA  Wetib Ridge    78788   
2  carol       baz       36  Ismazdan         CA   Ucagi View    58999   
     email_small  
0  alice@xyz.com  
1    bob@abc.com  
2  carol@jkl.com  

使用map更新

df1['email_small']=(df1['first']+df1['last_huge']).map(df2.set_index(df2['first_small']+df2['last_small'])['email_small'])
df1
Out[115]: 
  age_huge city_huge  first last_huge state_huge  street_huge zip_huge  \
0       23   Egviniw  alice       foo         MA   Jaifo Road    89899   
1       30  Manbaali    bob       bar         LA  Wetib Ridge    78788   
2       36  Ismazdan  carol       baz         CA   Ucagi View    58999   
     email_small  
0  alice@xyz.com  
1    bob@abc.com  
2  carol@jkl.com  

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM