Pandas 獲取 dataframe A 中包含 dataframe B 的 substring 的所有行

Question

所以我有 2 個數據框，

dataframe 1:

dataframe 2：

我想獲取 dataframe1 中包含 dataframe 2 中 columnB 的 substring 的所有行：

我正在使用df1['columnA'].isin(df2['columnB'])但我無法正常工作。

我應該如何實現這一目標？

Answer 1

您可以執行以下操作：

import pandas as pd
df1 = pd.DataFrame({"columnA":["apple, orange","pear, apple, lemon","banana, pear","cherry, pear, lemon"]})
df2 = pd.DataFrame({"columnB":["apple","cherry"]})

out = df1[df1.columnA.str.contains('|'.join(df2.columnB.values))]

那么你的 output DataFrame將是：

>>> out
               columnA
0        apple, orange
1   pear, apple, lemon
3  cherry, pear, lemon

怎么運行的

'|'.join(df2.columnB.values)將導致'apple|cherry' ，因為它使用|連接df2的columnB的值連接器。

然后str.contains columnA在df1的 columnA 中搜索蘋果或櫻桃詞（ |用作或）。

Answer 2

您可以通過以下方式進行列表理解：

df1[df1['columnA'].apply(lambda x: any([y for y in x for z in df2['columnB'] if y in z]))]

首先，您必須確保您的逗號分隔列表實際上是一個 python 列表，盡管df1['columnA'] = df1['columnA'].str.split(',')

完整代碼：

import pandas as pd
df1= pd.DataFrame({'columnA' : ['apple,orange', 'pear,apple,lemon','banana,pear','cherry,pear,lemon']})
df1['columnA'] = df1['columnA'].str.split(',')
df2 = pd.DataFrame({'columnB' : ['apple','cherry']})
df1 = df1[df1['columnA'].apply(lambda x: any([y for y in x for z in df2['columnB'] if y in z]))]
df1

output：

    columnA
0   [apple, orange]
1   [pear, apple, lemon]
3   [cherry, pear, lemon]

列表理解通過檢查每行列表中any每個df1['columnA']值是否在df2['columnB']中來工作。 y代表df1['columnA']的每一行列表中的各個項目， x代表df1['columnA']每一行。 最后， z代表df2['columnB']每一行。 因此，最終，您需要返回True如果任何y列表項在z中使用any和False如果不是為了設置 boolean 屏蔽以過濾掉不需要的包含False的行，即任何給定內的任何項目都沒有匹配項df1['columnA']行。

Pandas 獲取 dataframe A 中包含 dataframe B 的 substring 的所有行

問題描述

2 個解決方案

解決方案1
3 已采納 2020-07-28 21:41:37

怎么運行的

解決方案2
0 2020-07-28 21:53:45

Pandas 獲取 dataframe A 中包含 dataframe B 的 substring 的所有行

問題描述

2 個解決方案

解決方案1 3 已采納 2020-07-28 21:41:37

怎么運行的

解決方案2 0 2020-07-28 21:53:45

解決方案1
3 已采納 2020-07-28 21:41:37

解決方案2
0 2020-07-28 21:53:45