简体   繁体   English

检查每个列值是否存在于另一个 dataframe 列中,其中另一个列值是列 header

[英]Check if each column values exist in another dataframe column where another column value is the column header

companies.xlsx

    company     To
1   amazon      hi@test.de
2   google      bye@test.com 
3   amazon      hi@tld.com
4   starbucks   hi@test.de
5   greyhound   bye@tuz.de

emails.xlsx

   hi@test.de   bye@test.com    hi@tld.com   ...
1  amazon       google          microsoft
2  starbucks    amazon          tesla
3  Grey Hound   greyhound       
4  ferrari

So i have the 2 excel sheets above and read both of em:所以我有上面的 2 张 excel 表并阅读了两个 em:

file1 = pd.ExcelFile('data/companies.xlsx')
file2 = pd.ExcelFile('data/emails.xlsx')

df_companies = file1.parse('sheet1')
df_emails = file2.parse('sheet1')

what i'm trying to accomplish is:我想要完成的是:

  1. check if df_companies['To'] is an existing header in df_emails检查 df_companies['To'] 是否是 df_emails 中的现有 header
  2. if the header exists in df_emails, search the appropriate column of that header for df_companies['company']如果 df_emails 中存在 header,请在该 header 的相应列中搜索 df_companies['company']
  3. if the company is found, add a column to df_companies and fill in '1', if not fill in '0'如果找到公司,则在df_companies中添加一列并填写'1',如果没有填写'0'

eg: company amazon has the To email hi@test.de in company.xlsx.例如:亚马逊公司在 company.xlsx 中有 To email hi@test.de。 in email.xlsx the header hi@test.de exists and also amazon was found in the column - so its a '1'.在 email.xlsx 中存在 header hi@test.de 并且在列中也找到了亚马逊 - 所以它是“1”。

Anyone knows how to accomplish this?任何人都知道如何做到这一点?

Here's one approach.这是一种方法。 Convert df_emails to a dictionary and map it to df_companies .df_emails转换为字典,并将 map 转换为df_companies Then, compare the mapped column with df_companies['company'] .然后,将映射列与df_companies['company']进行比较。

df_companies['check'] = df_companies['To'].map(df_emails.to_dict(orient='list')).fillna('')
df_companies['check'] = df_companies.apply(lambda x: x['company'] in x['check'], axis=1).astype(int)

     company            To  check
1     amazon    hi@test.de      1
2     google  bye@test.com      1
3     amazon    hi@tld.com      0
4  starbucks    hi@test.de      1
5  greyhound    bye@tuz.de      0

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 偏移现有日期值,其中值存在于 dataframe 的另一列中 - Offsetting an existing date value, where values exist in another column in dataframe 检查Column一列中的每个值以及其他列值 - Check each values on Column a column with another column values 如何根据 Row_id 列将值写入 dataframe 的另一列并且匹配列中存在值? - How to write the values to another column of dataframe based on Row_id column and value exist in match column? PySpark 过滤器 DataFrame 其中一个列中的值在另一个 DataFrame 列中不存在 - PySpark filter DataFrame where values in a column do not exist in another DataFrame column 检查数据框中的值是否存在于每一行的另一列中 - Check if value in dataframe exists in another column for each row 如何检查一个数据帧中的列值是否可用或不检查另一数据帧的列中的值? - How to check values of column in one dataframe available or not in column of another dataframe? 检查另一个数据框列中是否存在数据框列中的少数值 - To check if few values in dataframe column exists in another dataframe column 检查列中是否存在值并在另一个 Pandas 中更改 - Check if a value exist in a column and change in another Pandas 如果另一列中的值彼此相邻,则对 dataframe 中的列值求和 - Sum column values in a dataframe if values in another column are next to each other 检查 PySaprk 列值是否存在于另一个 dataframe 列值中 - Check if PySaprk column values exists in another dataframe column values
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM