比較兩列具有熊貓字符串列表的列

Question

我在熊貓中有一個數據幀，該數據幀有兩列，每一行是一個字符串列表，如何檢查唯一行上的這兩列中是否有單詞匹配（標志列是所需的輸出）

A                B            flag

hello,hi,bye     bye, also       1
but, as well     see, pandas     0

我努力了

df['A'].str.contains(df['B'])

但是我得到這個錯誤

TypeError: 'Series' objects are mutable, thus they cannot be hashed

Answer 1

您可以通過split和set s將每個值轉換為單獨的單詞，並通過&檢查交集，然后將值轉換為boolean-將空集轉換為False ，最后將其轉換為int s- Falses為0 s和True s為1 s 。

zipped = zip(df['A'], df['B'])
df['flag'] = [int(bool(set(a.split(',')) & set(b.split(',')))) for a, b in zipped]
print (df)
              A            B  flag
0  hello,hi,bye    bye,also     1
1   but,as well  see,pandas     0

類似的解決方案：

df['flag'] = np.array([set(a.split(',')) & set(b.split(',')) for a, b in zipped]).astype(bool).astype(int)
print (df)
              A            B  flag
0  hello,hi,bye    bye, also     1
1   but,as well  see, pandas     0

編輯：有可能是之前一些空格, ，所以添加map與str.strip並刪除與空字符串filter ：

df = pd.DataFrame({'A': ['hello,hi,bye', 'but,,,as well'], 
                   'B': ['bye ,,, also', 'see,,,pandas']})
print (df)

               A             B
0   hello,hi,bye  bye ,,, also
1  but,,,as well  see,,,pandas

zipped = zip(df['A'], df['B'])

def setify(x):
    return set(map(str.strip, filter(None, x.split(','))))

df['flag'] = [int(bool(setify(a) & setify(b))) for a, b in zipped]
print (df)
               A             B  flag
0   hello,hi,bye  bye ,,, also     1
1  but,,,as well  see,,,pandas     0

比較兩列具有熊貓字符串列表的列

問題描述

1 個解決方案

解決方案1
3 已采納 2018-07-04 10:40:28

比較兩列具有熊貓字符串列表的列

問題描述

1 個解決方案

解決方案1 3 已采納 2018-07-04 10:40:28

解決方案1
3 已采納 2018-07-04 10:40:28