繁体   English   中英

Python 正则表达式完全匹配无法按预期工作

[英]Python regex fullmatch doesn't work as expected

我有一个包含一些句子的文本文件,我正在检查它们是否是基于某些规则的有效句子,并将有效或无效写入单独的文本文件。 我的主要问题是当我使用 ctrl + f 并输入我的正则表达式来搜索栏时,它匹配我想要匹配的字符串,但在代码中,它工作错误。 这是我的代码:

import re

pattern = re.compile('(([A-Z])[a-z\s,]*)((: ["‘][a-z,!?\.\s]*["’][.,!?])|(; [a-zA-Z\s]*[!.?])|(\s["‘][a-z,.;!?\s]*["’])|([\.?!]))')
text=open('validSentences',"w+")
with open('sentences.txt',encoding='utf8') as file:
    lines = file.readlines()
    for line in lines:
        matches = pattern.fullmatch(line)
        if(matches==None):
            text.write("not valid"+"\n")
        else:
            text.write("valid"+"\n") 
    file.close()

在文档中,它说 fullmatch 只匹配整个字符串匹配,这就是我想要做的,但这段代码对我拥有的所有句子都无效。
我拥有的文本文件:

How can you say that to me? 
As he looked at his reflection in the mirror, he took a deep breath. 
He nodded at himself and, feeling braver, he stepped outside the bathroom. He bumped straight into the 
extremely tall man, who was waiting by the door. 
David said ‘Oh, sorry!’. 
The happy pair discussed their future life 2gether and shared sweet words of admiration. 
We will not stop you; I promise! 
Come here ASAP! 
He pushed his chair back and went to the kitchen at 2 pM. 
I do not know... 
The main character in the movie said: "Play hard. Work harder." 

当我使用 ctrl+f 在 vs 代码中输入我的正则表达式时,整个第一、第二、第四、第七和八行都是高亮的,因此根据fullmatch()函数,它们需要打印为“有效”,但事实并非如此。 我需要帮助解决这个问题。

首先,删除lines = file.readlines()因为它已经将文件句柄移动到文件 stream 的末尾。 然后,您需要记住,当使用for line in lines:时, line变量有一个尾随换行符,所以

  • 在运行正则表达式之前使用line=line.rstrip()删除尾随空格或
  • 确保您的模式以\n? (可选的换行符),甚至\s* (任何零个或多个空格)。

所以,一个可能的解决方案看起来像

with open('sentences.txt',encoding='utf8') as file:
    for line in file:
        matches = pattern.fullmatch(line.rstrip('\n'))
...

或者,

pattern = re.compile(r'([A-Z][a-z\s,]*)(?:: ["‘][a-z,!?\.\s]*["’][.,!?]|; [a-zA-Z\s]*[!.?]|\s["‘][a-z,.;!?\s]*["’]|[.?!])\s*')
#...
with open('sentences.txt',encoding='utf8') as file:
    for line in file:
....

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM