简体   繁体   English

使用正则表达式从给定目录中提取文件名

[英]Extract file names from a given directory with regex

I am pretty weak in regex.我在正则表达式方面很弱。 I'm looking for some help with how to extract the .sav file name from the following string:我正在寻找有关如何从以下字符串中提取.sav文件名的帮助:

C:\\Users...\\Standard Loadflows Seq and Dyn PSSEv34 - 2019-02-20\\AutumnHi-20180531-183047-34-SystemNormal\\AutumnHi-20180531-183047-34-SystemNormal.sav C:\\Users...\\Standard Loadflows Seq 和 Dyn PSSEv34 - 2019-02-20\\AutumnHi-20180531-183047-34-SystemNormal\\AutumnHi-20180531-183047-34-SystemNormal.sav

Currently I am using this code:目前我正在使用此代码:

re.findall(r'\\(.+).sav',txt)

but it only finds但它只能找到

['Users\\...\\Standard Loadflows Seq and Dyn PSSEv34 - 2019-02-20\\AutumnHi-20180531-183047-34-SystemNormal\AutumnHi-20180531-183047-34-SystemNormal.sav was']

I'm trying to find "AutumnHi-20180531-183047-34-SystemNormal.sav"我试图找到"AutumnHi-20180531-183047-34-SystemNormal.sav"

I am using Python 3.7.我正在使用 Python 3.7。

You could match a backslash and then capture in a group matching not a backslash using a negated character class.您可以匹配反斜杠,然后使用否定字符类在不匹配反斜杠的组中捕获。 Then match a dot and sav.然后匹配一个点并保存。

You might use a negative lookahead to assert what is directly on the right is not a non whitespace char.您可以使用否定前瞻来断言直接在右侧的内容不是非空白字符。

\\([^\\]+\.sav)(?!\S)

Regex demo正则表达式演示

Regex101 ( link ): Regex101(链接):

txt = r'''C:\Users\\...\\Standard Loadflows Seq and Dyn PSSEv34 - 2019-02-20\\WinterLo-20180729-043047-34-SystemNormal\\WinterLo-20180729-043047-34-SystemNormal.sav'''

import re

print(re.findall(r'(?<=\\)[^\\]+sav',txt)[0])

Prints:印刷:

WinterLo-20180729-043047-34-SystemNormal.sav

You could achieve the same without re module:您可以在没有re模块的情况下实现相同的目标:

print(txt.split('\\')[-1])

The following pattern should match the filename:以下模式应与文件名匹配:
(?=[^\\\\]*$).*\\.sav

Regex101 Demo Regex101 演示

The above pattern asserts ( ?= is positive lookahead ) that no other character up to the end of the string is a backslash.上面的模式断言( ?= is positive lookahead )直到字符串末尾没有其他字符是反斜杠。 So essentially it finds the last backslash and then matches the desired text.所以基本上它会找到最后一个反斜杠,然后匹配所需的文本。 For other details, see "EXPLANATION" on the right side of the regex101 demo at the link above.有关其他详细信息,请参阅上面链接中 regex101 演示右侧的“解释”。

I am assuming you are not learning about regex but want to know how to handle parsing filenames.我假设您没有学习正则表达式,但想知道如何处理解析文件名。

I would use the pathlib module to handle parsing the filename.我会使用 pathlib 模块来处理文件名的解析。

C:\Users\barry>py -3.7
Python 3.7.2 (tags/v3.7.2:9a3ffc0492, Dec 23 2018, 23:09:28) [MSC v.1916 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import pathlib
>>> filename = r'C:\Users\...\Standard Loadflows Seq and Dyn PSSEv34 - 2019-02-20\WinterLo-20180729-043047-34-SystemNormal\WinterLo-20180729-043047-34-SystemNormal.sav'
>>> path = pathlib.Path(filename)
>>> path.name
'WinterLo-20180729-043047-34-SystemNormal.sav'
>>> path.parent
WindowsPath('C:/Users/.../Standard Loadflows Seq and Dyn PSSEv34 - 2019-02-20/WinterLo-20180729-043047-34-SystemNormal')
>>>

I'm guessing that these expressions:我猜这些表达:

[^\\]+\.sav
([^\\]+\.sav)

or some similar derivative of those might likely extract what we might want here.或者它们的一些类似衍生物可能会提取我们在这里可能想要的东西。

Test测试

import re

print(re.findall(r"([^\\]+\.sav)", "C:\\Users...\\Standard Loadflows Seq and Dyn PSSEv34 - 2019-02-20\\AutumnHi-20180531-183047-34-SystemNormal\\AutumnHi-20180531-183047-34-SystemNormal.sav"))

Output输出

['AutumnHi-20180531-183047-34-SystemNormal.sav']

Demo演示

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM