[英]Check if each column values exist in another dataframe column where another column value is the column header
companies.xlsx
company To
1 amazon hi@test.de
2 google bye@test.com
3 amazon hi@tld.com
4 starbucks hi@test.de
5 greyhound bye@tuz.de
emails.xlsx
hi@test.de bye@test.com hi@tld.com ...
1 amazon google microsoft
2 starbucks amazon tesla
3 Grey Hound greyhound
4 ferrari
So i have the 2 excel sheets above and read both of em:所以我有上面的 2 张 excel 表并阅读了两个 em:
file1 = pd.ExcelFile('data/companies.xlsx')
file2 = pd.ExcelFile('data/emails.xlsx')
df_companies = file1.parse('sheet1')
df_emails = file2.parse('sheet1')
what i'm trying to accomplish is:我想要完成的是:
eg: company amazon has the To email hi@test.de in company.xlsx.例如:亚马逊公司在 company.xlsx 中有 To email hi@test.de。 in email.xlsx the header hi@test.de exists and also amazon was found in the column - so its a '1'.在 email.xlsx 中存在 header hi@test.de 并且在列中也找到了亚马逊 - 所以它是“1”。
Anyone knows how to accomplish this?任何人都知道如何做到这一点?
Here's one approach.这是一种方法。 Convert df_emails
to a dictionary and map it to df_companies
.将df_emails
转换为字典,并将 map 转换为df_companies
。 Then, compare the mapped column with df_companies['company']
.然后,将映射列与df_companies['company']
进行比较。
df_companies['check'] = df_companies['To'].map(df_emails.to_dict(orient='list')).fillna('')
df_companies['check'] = df_companies.apply(lambda x: x['company'] in x['check'], axis=1).astype(int)
company To check
1 amazon hi@test.de 1
2 google bye@test.com 1
3 amazon hi@tld.com 0
4 starbucks hi@test.de 1
5 greyhound bye@tuz.de 0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.