簡體   English   中英

熊貓-檢查另一列的值中是否存在列標簽並更新該列

[英]Pandas - Check if a column label exists in another column's value and update the column

我的詞匯表單詞列表很長,我想檢查一段中是否包含詞匯表並將1標記為是,將0標記為否,簡化如下:

>>> glossary = ['phrase 1', 'phrase 2', 'phrase 3']
>>> glossary
['phrase 1', 'phrase 2', 'phrase 3']

>>> df= pd.DataFrame(['This is a phrase 1 and phrase 2', 'phrase 1', 
'phrase 3', 'phrase 1 & phrase 2. phrase 3 as well'],columns=['text'])
>>> df
                                text
0        This is a phrase 1 and phrase 2
1                               phrase 1
2                               phrase 3
3  phrase 1 & phrase 2. phrase 3 as well

將其連接如下:

                                    text  phrase 1  phrase 2  phrase 3
0        This is a phrase 1 and phrase 2       NaN       NaN       NaN
1                               phrase 1       NaN       NaN       NaN
2                               phrase 3       NaN       NaN       NaN
3  phrase 1 & phrase 2. phrase 3 as well       NaN       NaN       NaN

我想讓每個詞匯表列都與文本列進行比較,如果詞匯表在文本中,則更新為1,否則更新為0,在這種情況下

                                    text  phrase 1  phrase 2  phrase 3
0        This is a phrase 1 and phrase 2       1       1       0
1                               phrase 1       1       0       0
2                               phrase 3       0       0       1
3  phrase 1 & phrase 2. phrase 3 as well       1       1       1

您能告訴我我該如何實現嗎? 鑒於在我的數據框中,詞匯表列大約有3000列,所以我也想對邏輯進行概括,使其基於列標簽作為比較每一行中相應文本的鍵。

您可以將列表str.containsstr.contains一起使用,並將concatstr.containsint用作0,1 DataFrame:

L = [df['text'].str.contains(x) for x in glossary]
df1 = pd.concat(L, axis=1, keys=glossary).astype(int)
print (df1)
   phrase 1  phrase 2  phrase 3
0         1         1         0
1         1         0         0
2         0         0         1
3         1         1         1

然后join原版:

df = df.join(df1)
print (df)
                                    text  phrase 1  phrase 2  phrase 3
0        This is a phrase 1 and phrase 2         1         1         0
1                               phrase 1         1         0         0
2                               phrase 3         0         0         1
3  phrase 1 & phrase 2. phrase 3 as well         1         1         1

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM