我如何使用正则表达式从字符串中获取文件名

Question

I have this string here: 我在这里有这个字符串：

"['\r\n                    File: FLO_JIUWASOKLDM_05_HetR_IUSJA_&_Cracks.mp4', <br/>, '\r\n                    Size: 48.14 MB                ']"

and I have this regex \\w+\\.\\w+ 我有这个正则表达式\\w+\\.\\w+

And I want the regex to get the filename FLO_JIUWASOKLDM_05_HetR_IUSJA_&_Cracks.mp4 我希望正则表达式获取文件名FLO_JIUWASOKLDM_05_HetR_IUSJA_&_Cracks.mp4

But it breaks at the ampersand, which returns _Cracks.mp4 what do I need to do to fix it? 但是它在“与”号处中断，返回_Cracks.mp4 ，我需要做些什么来修复它？ I'm super new to Regex. 我是Regex的新手。

Answer 1

There are many options to exercise here, one for example, would be: 这里有许多选项可供选择，例如：

([^\s]+\.[a-z][a-z0-9]+)

Demo 演示

Test 测试

# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility

import re

regex = r"([^\s]+\.[a-z][a-z0-9]+)"

test_str = "\"['\\r\\n                    File: FLO_JIUWASOKLDM_05_HetR_IUSJA_&_Cracks.mp4', <br/>, '\\r\\n                    Size: 48.14 MB                ']\"
"

matches = re.finditer(regex, test_str, re.MULTILINE | re.IGNORECASE)

for matchNum, match in enumerate(matches, start=1):

    print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))

    for groupNum in range(0, len(match.groups())):
        groupNum = groupNum + 1

        print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))

# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.

Answer 2

\\w is the shorthand for "word character", meaning letters, numbers, and underscore. \\w是“单词字符”的简写，表示字母，数字和下划线。 Note the lack of ampersand. 请注意缺少＆符。 To include the ampersand, you could use the character class [\\w&] . 要包括“＆”号，可以使用字符类[\\w&] 。 Your regex would then be 您的正则表达式将是

[\w&]+\.\w+

BTW note that this may also match 48.14 depending on the regex function you use. 顺便说一句，这也可能匹配48.14具体取决于您使用的regex函数。

But maybe you want to include more characters than just ampersand. 但是也许您想要包含更多的字符，而不仅仅是＆号。 How about all non-whitespace characters? 所有非空白字符如何？

\S+\.\w+

This uses \\S , which is the inversion of the whitespace shorthand \\s . 这使用\\S ，这是空白速记\\s的反转。

Answer 3

Instead of figuring out what characters the file name may contain (note it may even contain spaces usually), you mayleverage the context: you know it starts after File: and space(s) and runs up to the ' . 您可以利用上下文：不必知道文件名可能包含哪些字符（注意，它通常甚至通常包含空格），您可以知道上下文：它在File:和空格之后开始并一直运行到' 。

So, you may achieve what you need using 因此，您可以使用所需的功能来实现

m = re.search(r"File:\s*([^']+)", s)
if m:
    print(m.group(1))

See the online Python demo . 请参阅在线Python演示。

See also the regex demo and the regex graph : 另请参阅regex演示和regex图：

Details 细节

File: - a literal substring File: -文字子字符串
\\s* - 0+ whitespaces \\s* -0+空格
([^']+) - Capturing group 1 ( match_object.group(1) ): 1 or more chars other than ' . ([^']+) -捕获组1（ match_object.group(1) ）：1个或多个除'以外'字符。

我如何使用正则表达式从字符串中获取文件名

问题描述

3 个解决方案

解决方案1
1 2019-06-13 17:31:39

Demo 演示

Test 测试

解决方案2
1 2019-06-13 18:09:42

解决方案3
1 2019-06-13 19:15:14

我如何使用正则表达式从字符串中获取文件名

问题描述

3 个解决方案

解决方案1 1 2019-06-13 17:31:39

Demo 演示

Test 测试

解决方案2 1 2019-06-13 18:09:42

解决方案3 1 2019-06-13 19:15:14

解决方案1
1 2019-06-13 17:31:39

解决方案2
1 2019-06-13 18:09:42

解决方案3
1 2019-06-13 19:15:14