[英]How to extract string between key substring and “/” with regex?
I have a string that's 我有一个字符串
/path/to/file?_subject_ID_SOMEOTHERSTRING
the path/to/file part changes depends on situation, and subject_ID
is always there. path / to / file部分的更改取决于具体情况,
subject_ID
始终存在。 I try to write a regex that extract only file
part of the string. 我尝试编写一个仅提取字符串
file
部分的正则表达式。 Using ?subject_ID
is definite, but I don't know how to safely get the file
使用
?subject_ID
是确定的,但是我不知道如何安全地获取file
My current regex looks like (.*[\\/]).*\\?_subject_ID
我当前的正则表达式看起来像
(.*[\\/]).*\\?_subject_ID
url = '/path/to/file?_subject_ID_SOMEOTHERSTRING'
file_re = re.compile('(.*[\/]).*\?_subject_ID')
file_re.search(url)
this will find the right string, but I still can't extract the file name 这将找到正确的字符串,但我仍然无法提取文件名
printing _.group(1) will get me /path/to/
. 打印_.group(1)将使我
/path/to/
。 What's the next step that gets me the actual file name? 下一步是什么让我获得实际的文件名?
As for your '(.*[\\/]).*\\?_subject_ID'
regex approach, you just need to add a capturing group around the second .*
. 至于
'(.*[\\/]).*\\?_subject_ID'
正则表达式方法,您只需要在第二个.*
周围添加一个捕获组。 You could use r'(.*/)(.*)\\?_subject_ID'
(then, there will be .group(1)
and .group(2)
parts captured), but it is not the most appropriate way to parse URLs in Python. 您可以使用
r'(.*/)(.*)\\?_subject_ID'
(然后将捕获.group(1)
和.group(2)
部分),但这不是解析URL的最合适方法在Python中。
You may use the non-regex approach here, here is a snippet showing how to leverage urlparse
and os.path
to parse the URL like yours: 您可以在此处使用非正则表达式方法,下面的代码片段显示了如何利用
urlparse
和os.path
像您一样解析URL:
import urlparse
path = urlparse.urlparse('/path/to/file?_subject_ID_SOMEOTHERSTRING').path
import os.path
print(os.path.split(path)[1]) # => file
print(os.path.split(path)[0]) # => /path/to
See the IDEONE demo 见IDEONE演示
It's pretty simple, really. 确实很简单。 Just match a
/
before and ?subject_ID
after: 只需在
/
之前匹配,然后在?subject_ID
之后匹配:
([^/?]*)\?subject_ID
The [^/?]*
(as opposed to .*
) is because otherwise it'd match the part before, too. [^/?]*
(而不是.*
)是因为否则它也将匹配之前的部分。 The ?
?
in the character class 在角色类中
If you want to get both the path and the file, you can do much the same thing, but also grab the part before the /
: 如果要获取路径和文件,可以做很多相同的事情,但是也可以在
/
之前抓取一部分:
([^?]*)([^/?]*)\?subject_ID
It's basically the same as the one before but with the first bit captured instead of ignored. 它基本上与之前的相同,但是捕获的第一位而不是被忽略。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.