[英]Pandas - Check if a column label exists in another column's value and update the column
我的词汇表单词列表很长,我想检查一段中是否包含词汇表并将1标记为是,将0标记为否,简化如下:
>>> glossary = ['phrase 1', 'phrase 2', 'phrase 3']
>>> glossary
['phrase 1', 'phrase 2', 'phrase 3']
>>> df= pd.DataFrame(['This is a phrase 1 and phrase 2', 'phrase 1',
'phrase 3', 'phrase 1 & phrase 2. phrase 3 as well'],columns=['text'])
>>> df
text
0 This is a phrase 1 and phrase 2
1 phrase 1
2 phrase 3
3 phrase 1 & phrase 2. phrase 3 as well
将其连接如下:
text phrase 1 phrase 2 phrase 3
0 This is a phrase 1 and phrase 2 NaN NaN NaN
1 phrase 1 NaN NaN NaN
2 phrase 3 NaN NaN NaN
3 phrase 1 & phrase 2. phrase 3 as well NaN NaN NaN
我想让每个词汇表列都与文本列进行比较,如果词汇表在文本中,则更新为1,否则更新为0,在这种情况下
text phrase 1 phrase 2 phrase 3
0 This is a phrase 1 and phrase 2 1 1 0
1 phrase 1 1 0 0
2 phrase 3 0 0 1
3 phrase 1 & phrase 2. phrase 3 as well 1 1 1
您能告诉我我该如何实现吗? 鉴于在我的数据框中,词汇表列大约有3000列,所以我也想对逻辑进行概括,使其基于列标签作为比较每一行中相应文本的键。
您可以将列表str.contains
与str.contains
一起使用,并将concat
与str.contains
为int
用作0,1
DataFrame:
L = [df['text'].str.contains(x) for x in glossary]
df1 = pd.concat(L, axis=1, keys=glossary).astype(int)
print (df1)
phrase 1 phrase 2 phrase 3
0 1 1 0
1 1 0 0
2 0 0 1
3 1 1 1
然后join
原版:
df = df.join(df1)
print (df)
text phrase 1 phrase 2 phrase 3
0 This is a phrase 1 and phrase 2 1 1 0
1 phrase 1 1 0 0
2 phrase 3 0 0 1
3 phrase 1 & phrase 2. phrase 3 as well 1 1 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.