[英]compare two columns having list of strings in pandas
我在熊貓中有一個數據幀,該數據幀有兩列,每一行是一個字符串列表,如何檢查唯一行上的這兩列中是否有單詞匹配(標志列是所需的輸出)
A B flag
hello,hi,bye bye, also 1
but, as well see, pandas 0
我努力了
df['A'].str.contains(df['B'])
但是我得到這個錯誤
TypeError: 'Series' objects are mutable, thus they cannot be hashed
您可以通過split和set
s將每個值轉換為單獨的單詞,並通過&
檢查交集,然后將值轉換為boolean-將空集轉換為False
,最后將其轉換為int
s- Falses
為0
s和True
s為1
s 。
zipped = zip(df['A'], df['B'])
df['flag'] = [int(bool(set(a.split(',')) & set(b.split(',')))) for a, b in zipped]
print (df)
A B flag
0 hello,hi,bye bye,also 1
1 but,as well see,pandas 0
類似的解決方案:
df['flag'] = np.array([set(a.split(',')) & set(b.split(',')) for a, b in zipped]).astype(bool).astype(int)
print (df)
A B flag
0 hello,hi,bye bye, also 1
1 but,as well see, pandas 0
編輯:有可能是之前一些空格,
,所以添加map
與str.strip
並刪除與空字符串filter
:
df = pd.DataFrame({'A': ['hello,hi,bye', 'but,,,as well'],
'B': ['bye ,,, also', 'see,,,pandas']})
print (df)
A B
0 hello,hi,bye bye ,,, also
1 but,,,as well see,,,pandas
zipped = zip(df['A'], df['B'])
def setify(x):
return set(map(str.strip, filter(None, x.split(','))))
df['flag'] = [int(bool(setify(a) & setify(b))) for a, b in zipped]
print (df)
A B flag
0 hello,hi,bye bye ,,, also 1
1 but,,,as well see,,,pandas 0
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.