简体   繁体   English

我如何使用正则表达式从字符串中获取文件名

[英]How can I get a Filename from a string using Regex

I have this string here: 我在这里有这个字符串:

"['\r\n                    File: FLO_JIUWASOKLDM_05_HetR_IUSJA_&_Cracks.mp4', <br/>, '\r\n                    Size: 48.14 MB                ']"

and I have this regex \\w+\\.\\w+ 我有这个正则表达式\\w+\\.\\w+

And I want the regex to get the filename FLO_JIUWASOKLDM_05_HetR_IUSJA_&_Cracks.mp4 我希望正则表达式获取文件名FLO_JIUWASOKLDM_05_HetR_IUSJA_&_Cracks.mp4

But it breaks at the ampersand, which returns _Cracks.mp4 what do I need to do to fix it? 但是它在“与”号处中断,返回_Cracks.mp4 ,我需要做些什么来修复它? I'm super new to Regex. 我是Regex的新手。

There are many options to exercise here, one for example, would be: 这里有许多选项可供选择,例如:

([^\s]+\.[a-z][a-z0-9]+)

Demo 演示

Test 测试

# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility

import re

regex = r"([^\s]+\.[a-z][a-z0-9]+)"

test_str = "\"['\\r\\n                    File: FLO_JIUWASOKLDM_05_HetR_IUSJA_&_Cracks.mp4', <br/>, '\\r\\n                    Size: 48.14 MB                ']\"
"

matches = re.finditer(regex, test_str, re.MULTILINE | re.IGNORECASE)

for matchNum, match in enumerate(matches, start=1):

    print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))

    for groupNum in range(0, len(match.groups())):
        groupNum = groupNum + 1

        print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))

# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.

\\w is the shorthand for "word character", meaning letters, numbers, and underscore. \\w是“单词字符”的简写,表示字母,数字和下划线。 Note the lack of ampersand. 请注意缺少&符。 To include the ampersand, you could use the character class [\\w&] . 要包括“&”号,可以使用字符类[\\w&] Your regex would then be 您的正则表达式将是

[\w&]+\.\w+
  • BTW note that this may also match 48.14 depending on the regex function you use. 顺便说一句,这也可能匹配48.14具体取决于您使用的regex函数。

But maybe you want to include more characters than just ampersand. 但是也许您想要包含更多的字符,而不仅仅是&号。 How about all non-whitespace characters? 所有非空白字符如何?

\S+\.\w+
  • This uses \\S , which is the inversion of the whitespace shorthand \\s . 这使用\\S ,这是空白速记\\s的反转。

Instead of figuring out what characters the file name may contain (note it may even contain spaces usually), you mayleverage the context: you know it starts after File: and space(s) and runs up to the ' . 您可以利用上下文:不必知道文件名可能包含哪些字符(注意,它通常甚至通常包含空格),您可以知道上下文:它在File:和空格之后开始并一直运行到'

So, you may achieve what you need using 因此,您可以使用所需的功能来实现

m = re.search(r"File:\s*([^']+)", s)
if m:
    print(m.group(1))

See the online Python demo . 请参阅在线Python演示

See also the regex demo and the regex graph : 另请参阅regex演示regex图

在此处输入图片说明

Details 细节

  • File: - a literal substring File: -文字子字符串
  • \\s* - 0+ whitespaces \\s* -0+空格
  • ([^']+) - Capturing group 1 ( match_object.group(1) ): 1 or more chars other than ' . ([^']+) -捕获组1( match_object.group(1) ):1个或多个除'以外'字符。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在python中使用正则表达式从包含日期的文件名中提取字符串? - How to extract string from a filename containing date using regex in python? 如何使用正则表达式从路径中提取文件名 - How to extract filename from path using regex 如何使用 python 防止文件名更改(重命名)? - How can I prevent the filename from changing (rename) using python? 如何使用split或regex从python中的字符串获取子字符串 - How to get sub string from a string in python using split or regex 如何在python中使用正则表达式获取字符串(数字和字符的混合)到最后一位? - How can I get a string(Mix of digit and char) up to last digit using regex in python? 我想从 python 3 中的字符串中提取所有十进制数,如何在不使用正则表达式的情况下做到这一点? - I want to extract all decimal numbers from a string in python 3, how can I do that without using regex? 如何从 python 中的正则表达式字符串中获取某人的姓名? - How can I get someone's name from a string with regex in python? 如何使用正则表达式从冒号前的字符串中提取单词并在 python 中排除 \n - How can i extract words from a string before colon and excluding \n from them in python using regex 正则表达式 - 我怎样才能得到这个字符串中的最后一个标签 - Regex - how can I get the last tag in this string 如何从python正则表达式中排除特定字符串 - How can I exclude a specific string from from python regex
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM