[英]Pandas - Check if a column label exists in another column's value and update the column
我的詞匯表單詞列表很長,我想檢查一段中是否包含詞匯表並將1標記為是,將0標記為否,簡化如下:
>>> glossary = ['phrase 1', 'phrase 2', 'phrase 3']
>>> glossary
['phrase 1', 'phrase 2', 'phrase 3']
>>> df= pd.DataFrame(['This is a phrase 1 and phrase 2', 'phrase 1',
'phrase 3', 'phrase 1 & phrase 2. phrase 3 as well'],columns=['text'])
>>> df
text
0 This is a phrase 1 and phrase 2
1 phrase 1
2 phrase 3
3 phrase 1 & phrase 2. phrase 3 as well
將其連接如下:
text phrase 1 phrase 2 phrase 3
0 This is a phrase 1 and phrase 2 NaN NaN NaN
1 phrase 1 NaN NaN NaN
2 phrase 3 NaN NaN NaN
3 phrase 1 & phrase 2. phrase 3 as well NaN NaN NaN
我想讓每個詞匯表列都與文本列進行比較,如果詞匯表在文本中,則更新為1,否則更新為0,在這種情況下
text phrase 1 phrase 2 phrase 3
0 This is a phrase 1 and phrase 2 1 1 0
1 phrase 1 1 0 0
2 phrase 3 0 0 1
3 phrase 1 & phrase 2. phrase 3 as well 1 1 1
您能告訴我我該如何實現嗎? 鑒於在我的數據框中,詞匯表列大約有3000列,所以我也想對邏輯進行概括,使其基於列標簽作為比較每一行中相應文本的鍵。
您可以將列表str.contains
與str.contains
一起使用,並將concat
與str.contains
為int
用作0,1
DataFrame:
L = [df['text'].str.contains(x) for x in glossary]
df1 = pd.concat(L, axis=1, keys=glossary).astype(int)
print (df1)
phrase 1 phrase 2 phrase 3
0 1 1 0
1 1 0 0
2 0 0 1
3 1 1 1
然后join
原版:
df = df.join(df1)
print (df)
text phrase 1 phrase 2 phrase 3
0 This is a phrase 1 and phrase 2 1 1 0
1 phrase 1 1 0 0
2 phrase 3 0 0 1
3 phrase 1 & phrase 2. phrase 3 as well 1 1 1
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.