繁体   English   中英

如何检查Python中的列表中是否存在DataFrame字符串列的第一个单词?

[英]How to check if first word of a DataFrame string column is present in a List in Python?

我有一个数据帧df_sentences和列表question_words如下:

df_sentences:

sentence                         label
you will not forget this movie   0
will the novel ever die          1
why we drink alcohol             1
did trump win the election       1
ambiance is perfect              0


question_words = ['what', 'why', 'when', 'where', 'whose', 'which', 'whom', 'who', 'how', 
                         'do', 'are', 'will', 'did', 'will', 'am', 'are', 'was', 'were', 'can', 'has', 'have']

我要检查sentence列的第一个单词是否存在于列表question_words ,并将结果返回到新列ques_word

预期产量:

sentence                         label  ques_word
you will not forget this movie   0      0
will the novel ever die          1      1
why we drink alcohol             1      1
did trump win the election       1      1
the ambiance is perfect          0      0

到目前为止,我尝试使用.str.contains('|'.join(question_words)).astype(int)但正如预期的那样,它返回与question_words列表匹配的所有子字符串的所有数量。

.str.split(" ")[0].contains('|'.join(question_words)).astype(int)

应该做的工作

如果要快速解决方案,请使用列表理解。

q_set = set(question_words)
df['ques_word'] = [
    1 if w.split(None, 1)[0]  in q_set else 0 for w in df.sentence
]

df
                         sentence  label  ques_word
0  you will not forget this movie      0          0
1         will the novel ever die      1          1
2            why we drink alcohol      1          1
3      did trump win the election      1          1
4             ambiance is perfect      0          0

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM