简体   繁体   English

根据 python 中的正则表达式条件提取句子

[英]Extract sentence based on regex conditions in python

I have a dataset containing 9000 sentences from which I need 20/20 statements based upon some conditions.我有一个包含 9000 个句子的数据集,根据某些条件我需要 20/20 个语句。 However, when I try to match those conditions either the sentence is outputted or the conditions are not met.但是,当我尝试匹配这些条件时,要么输出句子,要么不满足条件。 The first 20 sentences should contain one verb.前 20 句应包含一个动词。

For the second part I would like to have sentences that contain more than 2 verbs.对于第二部分,我想要包含超过 2 个动词的句子。

Right now I have the following code for checking if the amount of verbs is less than 2现在我有以下代码来检查动词的数量是否小于 2

import re
import spacy
import en_core_web_md
nlp=en_core_web_md.load()

test = "This sentence has just 1 verb"
test2 = "I have put multiple verbs in this sentence because it is possible and I want it"

doc1 = nlp(test)
doc2 = nlp(test2)

empt = []
for item in doc1.sents:
    verbs = 0
    for token in item:
        if token.pos_ == "VERB":
            verbs += 1
            if verbs < 2:
                empt.append(item)

However, I end up with an empty list.但是,我最终得到一个空列表。

Can someone tell me what I am doing wrong so i can adjust this code for every additional condition?有人可以告诉我我做错了什么,以便我可以针对每个附加条件调整此代码吗?

You just need to pull the last two lines back two indentation levels.您只需要将最后两行拉回两个缩进级别。 You only want to check the number of verbs in the entire sentence after all the tokens have been considered.在考虑了所有标记之后,您只想检查整个句子中的动词数量。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM