如何用正则表达式提取键子字符串和“ /”之间的字符串？

Question

I have a string that's 我有一个字符串

/path/to/file?_subject_ID_SOMEOTHERSTRING

the path/to/file part changes depends on situation, and subject_ID is always there. path / to / file部分的更改取决于具体情况， subject_ID始终存在。 I try to write a regex that extract only file part of the string. 我尝试编写一个仅提取字符串file部分的正则表达式。 Using ?subject_ID is definite, but I don't know how to safely get the file 使用?subject_ID是确定的，但是我不知道如何安全地获取file

My current regex looks like (.*[\\/]).*\\?_subject_ID 我当前的正则表达式看起来像(.*[\\/]).*\\?_subject_ID

url = '/path/to/file?_subject_ID_SOMEOTHERSTRING'
file_re = re.compile('(.*[\/]).*\?_subject_ID')
file_re.search(url)

this will find the right string, but I still can't extract the file name 这将找到正确的字符串，但我仍然无法提取文件名

printing _.group(1) will get me /path/to/ . 打印_.group（1）将使我/path/to/ 。 What's the next step that gets me the actual file name? 下一步是什么让我获得实际的文件名？

Answer 1

As for your '(.*[\\/]).*\\?_subject_ID' regex approach, you just need to add a capturing group around the second .* . 至于'(.*[\\/]).*\\?_subject_ID'正则表达式方法，您只需要在第二个.*周围添加一个捕获组。 You could use r'(.*/)(.*)\\?_subject_ID' (then, there will be .group(1) and .group(2) parts captured), but it is not the most appropriate way to parse URLs in Python. 您可以使用r'(.*/)(.*)\\?_subject_ID' （然后将捕获.group(1)和.group(2)部分），但这不是解析URL的最合适方法在Python中。

You may use the non-regex approach here, here is a snippet showing how to leverage urlparse and os.path to parse the URL like yours: 您可以在此处使用非正则表达式方法，下面的代码片段显示了如何利用urlparse和os.path像您一样解析URL：

import urlparse
path = urlparse.urlparse('/path/to/file?_subject_ID_SOMEOTHERSTRING').path
import os.path
print(os.path.split(path)[1]) # => file
print(os.path.split(path)[0]) # => /path/to

See the IDEONE demo 见IDEONE演示

Answer 2

It's pretty simple, really. 确实很简单。 Just match a / before and ?subject_ID after: 只需在/之前匹配，然后在?subject_ID之后匹配：

([^/?]*)\?subject_ID

The [^/?]* (as opposed to .* ) is because otherwise it'd match the part before, too. [^/?]* （而不是.* ）是因为否则它也将匹配之前的部分。 The ? ? in the character class 在角色类中

If you want to get both the path and the file, you can do much the same thing, but also grab the part before the / : 如果要获取路径和文件，可以做很多相同的事情，但是也可以在/之前抓取一部分：

([^?]*)([^/?]*)\?subject_ID

It's basically the same as the one before but with the first bit captured instead of ignored. 它基本上与之前的相同，但是捕获的第一位而不是被忽略。

如何用正则表达式提取键子字符串和“ /”之间的字符串？

问题描述

2 个解决方案

解决方案1
3 已采纳 2016-06-27 22:43:23

解决方案2
2 2016-06-27 22:37:44

如何用正则表达式提取键子字符串和“ /”之间的字符串？

问题描述

2 个解决方案

解决方案1 3 已采纳 2016-06-27 22:43:23

解决方案2 2 2016-06-27 22:37:44

解决方案1
3 已采纳 2016-06-27 22:43:23

解决方案2
2 2016-06-27 22:37:44