简体   繁体   English

需要帮助python正则表达式

[英]Need help in python regex

I have list of file in directory: 我有目录中的文件列表:

gp_dump_0_10_20171112003450 <==
gp_dump_0_11_20171112003450 <==
gp_dump_0_12_20171112003450 <==
gp_dump_0_13_20171112003450 <==
gp_dump_0_14_20171112003450 <==
gp_dump_1_1_20171112003450 <==
gp_dump_1_1_20171112003450_post_data
gp_dump_20171112003450_ao_state_file
gp_dump_20171112003450_co_state_file
gp_dump_20171112003450_last_operation
gp_dump_20171112003450.rpt

I want to fetch only marked ( <==) files from a directory. 我想从目录中只获取标记的(<==)文件。 Below is the python code I have written which is not working as expected: 下面是我编写的python代码,它没有按预期工作:

import os
import re

dump_key = 20171112003450
backup_files = os.listdir('/home/jadhavy/backup/')
segment_file_regex = "gp_dump_\d+?_\d+?_%s$" %dump_key
for file in backup_files:
        if file == re.finditer(segment_file_regex,file,re.S):
                print(file)

EDIT:Changed regex to match end, I'm not getting any result after running this. 编辑:更改正则表达式匹配结束,运行此后我没有得到任何结果。

Two main things: 两件主要的事情:

  1. To check if a string matches a pattern, the function you want is re.match , not re.finditer . 要检查字符串是否与模式匹配,您需要的函数是re.match ,而不是re.finditer re.match will return a match object if the pattern matches the string at the beginning, or None if there is no match. 如果模式匹配开头的字符串,则re.match将返回匹配对象,如果没有匹配,则返回None
  2. The regex will also match gp_dump_1_1_20171112003450_post_data because it starts with a match. 正则表达式也将匹配gp_dump_1_1_20171112003450_post_data因为它以匹配开头。 The $ metacharacter in a regex means the end-of-string, so if you put it at the end of the pattern it won't match strings with trailing characters. 正则表达式中的$元字符表示字符串结尾,因此如果将其放在模式的末尾,则不会将字符串与尾随字符匹配。

Here is your code with the above adjustments: 以下是您进行上述调整的代码:

import os
import re

dump_key = 20171112003450
backup_files = os.listdir('/home/jadhavy/backup/')
segment_file_regex = "gp_dump_\d+?_\d+?_%s$" %dump_key
for file in backup_files:
        if re.match(segment_file_regex,file,re.S):
                print(file)

Three other tips: 其他三个提示:

  1. You shouldn't need the re.S flag in this case, because it only affects the . 在这种情况下,您不应该需要re.S标志,因为它只会影响. metacharacter. 元字符。
  2. Raw strings are usually a good idea when writing regexes to avoid accidentally interpreting one character as another since regexes tend to contain lots of backslashes. 在编写正则表达式时,原始字符串通常是一个好主意,以避免意外地将一个字符解释为另一个字符,因为正则表达式往往包含大量反斜杠。 For example, r'\\n' becomes '\\\\n' instead of '\\n' (newline). 例如, r'\\n'变为'\\\\n'而不是'\\n' (换行符)。
  3. When inserting a string into a regex, you can use re.escape to escape metacharacters. 将字符串插入正则表达式时,可以使用re.escape来转义元字符。 For example r'abc%sghi' % re.escape('[def]') becomes r'abc\\[def\\]ghi' instead of r'abc[def]ghi' which isn't the regex you'd want. 例如r'abc%sghi' % re.escape('[def]')变成r'abc\\[def\\]ghi'而不是r'abc[def]ghi'这不是你想要的正则表达式。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM