為什么我的正則表達式可以在 regexr.com 上運行，但從命令行運行時會拋出錯誤？

Question

我需要用正則表達式解決兩個問題來定位文件路徑。

1) 主要問題：我收到一條我不明白的錯誤消息。 2）在我更改一些小腳本之前，腳本會運行，但正則表達式搜索什么也沒返回。

當在匹配正確定位的 regexr.com 和 pythex.org 中測試時，正則表達式確實有效。 當我從命令行運行它時它不起作用。

這是我針對的正則表達式：

('([a-zA-Z]:\\)([a-zA-Z0-9 ]*\\)*([a-zA-Z0-9 ]*\/)*([a-zA-Z0-9 ])*(\.[a-zA-Z]*)*'

以下是其使用的代碼：

import os
import re

#run script from directory the script is in - place it in the dir being processed
start_path = os.path.dirname(os.path.realpath(__file__))
metadata_path = start_path + "\Metadata"

#change directory to the metadata folder where email.txt is
try:
    os.chdir(metadata_path)
except: print ('Could not change directory. Please try again.')

with open("email.txt", 'r', encoding = 'utf-8') as file:
    all_lines = file.readlines()
    no_header = all_lines[5:] #remove the header lines from email.txt
new_lines =[]
all_files=[]
unique_files =[]
for i in range(len(no_header)):#remove square charcter
    new_lines.append(re.sub('\S\-\d+', '',no_header[i]))

for i in range(len(new_lines)):#capture all the names of files containing personal emails
    test = re.search('([a-zA-Z]:\\)([a-zA-Z0-9 ]*\\)*([a-zA-Z0-9 ]*\/)*([a-    
    zA-Z0-9 ])*(\.[a-zA-Z]*)*',new_lines[i]) 
    print (test)

我收到錯誤消息 're.error: missing ), unterminated subpattern at position 0'

它有偶數個括號，就我所見，它們似乎相互匹配。 我猜這與我在模式中對事物進行分組的方式有關。

至於它什么都不返回，我是否錯過了在線測試人員沒有發現的python特定規則？

謝謝！

Answer 1

我的猜測是它可能在表達式中的某處缺少r或括號：

測試

import re

regex = r"([a-zA-Z]:\\)([a-zA-Z0-9 ]*\\)*([a-zA-Z0-9 ]*\/)*([a-zA-Z0-9 ])*(\.[a-zA-Z]*)*"

test_str = "a:\\a\\a/a.a"

print(re.search(regex, test_str))

該表達式在regex101.com 的右上角面板中進行了解釋，如果您希望探索/簡化/修改它，並且在此鏈接中，您可以觀看它如何與某些示例輸入匹配，如果您願意的話。

代碼

import os
import re

#run script from directory the script is in - place it in the dir being processed
start_path = os.path.dirname(os.path.realpath(__file__))
metadata_path = start_path + "\Metadata"

#change directory to the metadata folder where email.txt is
try:
    os.chdir(metadata_path)
except: print ('Could not change directory. Please try again.')

with open("email.txt", 'r', encoding = 'utf-8') as file:
    all_lines = file.readlines()
    no_header = all_lines[5:] #remove the header lines from email.txt
new_lines =[]
all_files=[]
unique_files =[]
for i in range(len(no_header)):#remove square charcter
    new_lines.append(re.sub(r'\S\-\d+', '',no_header[i]))

for i in range(len(new_lines)):#capture all the names of files containing personal emails
    test = re.search(r'([a-zA-Z]:\\)([a-zA-Z0-9 ]*\\)*([a-zA-Z0-9 ]*\/)*([a-    
    zA-Z0-9 ])*(\.[a-zA-Z]*)*',new_lines[i]) 
    print (test)

Answer 2

這是因為\\\\字符（第 12 和 29 列），它們在 python 中被解釋為單個\\ ，然后在您的正則表達式中忽略以下) 。 解決此問題的最簡單方法是“雙倍espace”您的反斜杠：

'([a-zA-Z]:\\\\)([a-zA-Z0-9 ]*\\\\)*([a-zA-Z0-9 ]*\/)*([a-zA-Z0-9 ])*(\.[a-zA-Z]*)*'

這很丑陋，但可以完成工作。

為什么我的正則表達式可以在 regexr.com 上運行，但從命令行運行時會拋出錯誤？

問題描述

2 個解決方案

解決方案1
1 2019-07-23 16:41:50

測試

代碼

解決方案2
0 2019-07-23 16:44:32

為什么我的正則表達式可以在 regexr.com 上運行，但從命令行運行時會拋出錯誤？

問題描述

2 個解決方案

解決方案1 1 2019-07-23 16:41:50

測試

代碼

解決方案2 0 2019-07-23 16:44:32

解決方案1
1 2019-07-23 16:41:50

解決方案2
0 2019-07-23 16:44:32