熊猫-检查另一列的值中是否存在列标签并更新该列

Question

我的词汇表单词列表很长，我想检查一段中是否包含词汇表并将1标记为是，将0标记为否，简化如下：

>>> glossary = ['phrase 1', 'phrase 2', 'phrase 3']
>>> glossary
['phrase 1', 'phrase 2', 'phrase 3']

>>> df= pd.DataFrame(['This is a phrase 1 and phrase 2', 'phrase 1', 
'phrase 3', 'phrase 1 & phrase 2. phrase 3 as well'],columns=['text'])
>>> df
                                text
0        This is a phrase 1 and phrase 2
1                               phrase 1
2                               phrase 3
3  phrase 1 & phrase 2. phrase 3 as well

将其连接如下：

                                    text  phrase 1  phrase 2  phrase 3
0        This is a phrase 1 and phrase 2       NaN       NaN       NaN
1                               phrase 1       NaN       NaN       NaN
2                               phrase 3       NaN       NaN       NaN
3  phrase 1 & phrase 2. phrase 3 as well       NaN       NaN       NaN

我想让每个词汇表列都与文本列进行比较，如果词汇表在文本中，则更新为1，否则更新为0，在这种情况下

                                    text  phrase 1  phrase 2  phrase 3
0        This is a phrase 1 and phrase 2       1       1       0
1                               phrase 1       1       0       0
2                               phrase 3       0       0       1
3  phrase 1 & phrase 2. phrase 3 as well       1       1       1

您能告诉我我该如何实现吗？ 鉴于在我的数据框中，词汇表列大约有3000列，所以我也想对逻辑进行概括，使其基于列标签作为比较每一行中相应文本的键。

Answer 1

您可以将列表str.contains与str.contains一起使用，并将concat与str.contains为int用作0,1 DataFrame：

L = [df['text'].str.contains(x) for x in glossary]
df1 = pd.concat(L, axis=1, keys=glossary).astype(int)
print (df1)
   phrase 1  phrase 2  phrase 3
0         1         1         0
1         1         0         0
2         0         0         1
3         1         1         1

然后join原版：

df = df.join(df1)
print (df)
                                    text  phrase 1  phrase 2  phrase 3
0        This is a phrase 1 and phrase 2         1         1         0
1                               phrase 1         1         0         0
2                               phrase 3         0         0         1
3  phrase 1 & phrase 2. phrase 3 as well         1         1         1

熊猫-检查另一列的值中是否存在列标签并更新该列

问题描述

1 个解决方案

解决方案1
2 已采纳 2017-12-23 12:52:50

熊猫-检查另一列的值中是否存在列标签并更新该列

问题描述

1 个解决方案

解决方案1 2 已采纳 2017-12-23 12:52:50

解决方案1
2 已采纳 2017-12-23 12:52:50