繁体   English   中英

如果在另一个 dataframe 列 pandas 中找到列中的值,则返回值

[英]Return value if value in a column is found in another dataframe column pandas

我有两个dfs。 df1:

              Summary
0        This is a basket of red apples.
1        We found a bushel of fruit. They are red.
2        There is a peck of pears that taste sweet.
3        We have a box of plums.
4        This is bag of green apples.

df2:

      Fruits        
0    plum     
1    pear     
2    apple     
3    orange

我希望 output 是:

df2:

      Fruits     Summary   
0    plum        We have a box of plums.
1    pear        There is a peck of pears that taste sweet.
2    apple       This is a basket of red apples, This is bag of green apples
3    orange

简单来说,如果结果是在摘要中找到的,那么摘要中的适当值应该返回,否则什么也不返回或 NaN。

编辑:如果找到多个实例,则应返回所有实例,并用逗号分隔。

  • 我认为在每个句子中找到唯一的水果比为每个水果找到每个句子要快。
    • 为每个水果找到每个句子,需要为每个水果迭代每个句子。
    • 据推测,与句子相比,独特的水果更少,因此在句子中找到水果的速度更快。
    • 与其他方式相比的速度是一个假设,尚未经过测试。
  • 对于每个'Summary'将所有找到'Fruits'添加到list ,因为一个句子中可能有多个水果。
  • 展开lists以分隔行
  • 合并df1df2
  • Groupby 'Fruits'并将每个句子组合成一个逗号分隔的字符串。
import pandas as pd

# sample dataframes
df1 = pd.DataFrame({'Summary': ['This is a basket of red apples. They are sour.', 'We found a bushel of fruit. They are red.', 'There is a peck of pears that taste sweet.', 'We have a box of plums.', 'This is bag of green apples.', 'We have apples and pears']})

df2 = pd.DataFrame({'Fruits': ['plum', 'pear', 'apple', 'orange']})

# display(df1)
                                          Summary
0  This is a basket of red apples. They are sour.
1       We found a bushel of fruit. They are red.
2      There is a peck of pears that taste sweet.
3                         We have a box of plums.
4                    This is bag of green apples.
5                        We have apples and pears

# set all values to lowercase in Fruits
df2.Fruits = df2.Fruits.str.lower()

# create an array of unique Fruits from df2
unique_fruits = df2.Fruits.unique()

# for each sentence check if a fruit is in the sentence and create a list
df1['Fruits'] = df1.Summary.str.lower().apply(lambda x: [v for v in unique_fruits if v in x])

# explode the lists into separate rows; if sentences contain more than one fruit, there will be more than one row
df1 = df1.explode('Fruits').reset_index(drop=True)

# merge df1 to df2
df2_ = df2.merge(df1, on='Fruits', how='left')

# groupby fruit, into a string
df2_ = df2_.groupby('Fruits').Summary.agg(list).str.join(', ').reset_index()

# display(df2_)
   Fruits                                                                                                 Summary
0   apple  This is a basket of red apples. They are sour., This is bag of green apples., We have apples and pears
1  orange                                                                                                     NaN
2    pear                                    There is a peck of pears that taste sweet., We have apples and pears
3    plum                                                                                 We have a box of plums.

选择

  • 如前所述,我的假设是这将是较慢的选择,即使代码更少,因为它需要遍历每个句子,每个水果。
df2['Summary'] = df2.Fruits.str.lower().apply(lambda x: ', '.join([v for v in df1.Summary if x in v.lower()]))

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM