[英]How can I get a Filename from a string using Regex
I have this string here: 我在这里有这个字符串:
"['\r\n File: FLO_JIUWASOKLDM_05_HetR_IUSJA_&_Cracks.mp4', <br/>, '\r\n Size: 48.14 MB ']"
and I have this regex \\w+\\.\\w+
我有这个正则表达式\\w+\\.\\w+
And I want the regex to get the filename FLO_JIUWASOKLDM_05_HetR_IUSJA_&_Cracks.mp4
我希望正则表达式获取文件名FLO_JIUWASOKLDM_05_HetR_IUSJA_&_Cracks.mp4
But it breaks at the ampersand, which returns _Cracks.mp4
what do I need to do to fix it? 但是它在“与”号处中断,返回_Cracks.mp4
,我需要做些什么来修复它? I'm super new to Regex. 我是Regex的新手。
There are many options to exercise here, one for example, would be: 这里有许多选项可供选择,例如:
([^\s]+\.[a-z][a-z0-9]+)
# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility
import re
regex = r"([^\s]+\.[a-z][a-z0-9]+)"
test_str = "\"['\\r\\n File: FLO_JIUWASOKLDM_05_HetR_IUSJA_&_Cracks.mp4', <br/>, '\\r\\n Size: 48.14 MB ']\"
"
matches = re.finditer(regex, test_str, re.MULTILINE | re.IGNORECASE)
for matchNum, match in enumerate(matches, start=1):
print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))
for groupNum in range(0, len(match.groups())):
groupNum = groupNum + 1
print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))
# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.
\\w
is the shorthand for "word character", meaning letters, numbers, and underscore. \\w
是“单词字符”的简写,表示字母,数字和下划线。 Note the lack of ampersand. 请注意缺少&符。 To include the ampersand, you could use the character class [\\w&]
. 要包括“&”号,可以使用字符类[\\w&]
。 Your regex would then be 您的正则表达式将是
[\w&]+\.\w+
48.14
depending on the regex function you use. 顺便说一句,这也可能匹配48.14
具体取决于您使用的regex函数。 But maybe you want to include more characters than just ampersand. 但是也许您想要包含更多的字符,而不仅仅是&号。 How about all non-whitespace characters? 所有非空白字符如何?
\S+\.\w+
\\S
, which is the inversion of the whitespace shorthand \\s
. 这使用\\S
,这是空白速记\\s
的反转。 Instead of figuring out what characters the file name may contain (note it may even contain spaces usually), you mayleverage the context: you know it starts after File:
and space(s) and runs up to the '
. 您可以利用上下文:不必知道文件名可能包含哪些字符(注意,它通常甚至通常包含空格),您可以知道上下文:它在File:
和空格之后开始并一直运行到'
。
So, you may achieve what you need using 因此,您可以使用所需的功能来实现
m = re.search(r"File:\s*([^']+)", s)
if m:
print(m.group(1))
See the online Python demo . 请参阅在线Python演示 。
See also the regex demo and the regex graph : 另请参阅regex演示和regex图 :
Details 细节
File:
- a literal substring File:
-文字子字符串 \\s*
- 0+ whitespaces \\s*
-0+空格 ([^']+)
- Capturing group 1 ( match_object.group(1)
): 1 or more chars other than '
. ([^']+)
-捕获组1( match_object.group(1)
):1个或多个除'
以外'
字符。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.