[英]How to print matching strings in python with regex?
I am working on a Python script that would go through a directory with a bunch of files and extract the strings that match a certain pattern.我正在开发一个 Python 脚本,该脚本将遍历包含一堆文件的目录并提取与特定模式匹配的字符串。 More specifically, I'm trying to extract the values of serial number and a max-limit, and the lines look something like this:更具体地说,我试图提取序列号和最大限制的值,这些行看起来像这样:
#serial number = 642E0523D775
max-limit=50M/50M
I've got the script to go through the files, but I'm having an issue with it actually printing the values that I want it to.我有脚本来检查文件,但我在实际打印我想要的值时遇到了问题。 Instead of it printing the values, I just get the 'Nothing fount' output.我没有打印值,而是得到“Nothing fount”输出。
I'm thinking that it probably has something to do with the regex I'm using, but I can't for the life of me figure out how formulate this.我认为它可能与我正在使用的正则表达式有关,但我一生都无法弄清楚如何制定它。
The script I've come up with so far:到目前为止我想出的脚本:
import os
import re
#Where I'm searching
user_input = "/path/to/files/"
directory = os.listdir(user_input)
#What I'm looking for
searchstring = ['serial number', 'max-limit']
re_first = re.compile ('serial.\w.*')
re_second = re.compile ('max-limit=\w*.\w*')
#Regex combine
regex_list = [re_first, re_second]
#Looking
for fname in directory:
if os.path.isfile(user_input + os.sep + fname):
# Full path
f = open(user_input + os.sep + fname, 'r')
f_contents = f.read()
content = fname + f_contents
files = os.listdir(user_input)
lines_seen = set()
for f in files:
print(f)
if f not in lines_seen: # not a duplicate
for regex in regex_list:
matches = re.findall(regex, content)
if matches != None:
for match in matches:
print(match)
else:
print('Nema')
f.close()
Per the documentation, the regex module's match()
searches for "characters at the beginning of a string [that] match the regular expression pattern".根据文档,regex 模块的match()
搜索“字符串开头的字符 [that] 匹配正则表达式模式”。 Since you are prepending your file contents with the file name in the line:由于您在行中使用文件名预先添加文件内容:
content=fname + f_contents
and then match
ing your pattern against the content
in the line:然后match
您的模式与行中的content
进行match
:
result=re.match(regex, content)
there will never be a match.永远不会有比赛。
Since you want to locate a match anywhere in string, use search()
instead.由于您想在字符串中的任何位置定位匹配项,请改用search()
。
See also:search()
vs match()
另请参阅:search()
与match()
The pattern ^[\\w&.\\-]+$
provided would match neither serial number = 642E0523D775
as it contains a space (" "), nor max-limit=50M/50M
as it contains a forward slash ("/").提供的模式^[\\w&.\\-]+$
既不匹配serial number = 642E0523D775
因为它包含空格 (" "),也不匹配max-limit=50M/50M
因为它包含一个正斜杠 ("/")。 Both also contain an equals sign ("=") which cannot be matched by your pattern.两者还包含一个等号 ("="),您的模式无法匹配该等号。
Additionally, the character class in this pattern matches the backslash (""), so you may want to remove it (the dash ("-") should not be escaped when it is at the end of the character class).此外,此模式中的字符类与反斜杠 ("") 匹配,因此您可能希望将其删除(破折号 ("-") 在字符类的末尾时不应转义)。
A pattern to match both these strings as well could be:匹配这两个字符串的模式也可以是:
^[\\w&. \\/=\\-]+$
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.