简体   繁体   English

如何用正则表达式提取键子字符串和“ /”之间的字符串?

[英]How to extract string between key substring and “/” with regex?

I have a string that's 我有一个字符串

/path/to/file?_subject_ID_SOMEOTHERSTRING

the path/to/file part changes depends on situation, and subject_ID is always there. path / to / file部分的更改取决于具体情况, subject_ID始终存在。 I try to write a regex that extract only file part of the string. 我尝试编写一个仅提取字符串file部分的正则表达式。 Using ?subject_ID is definite, but I don't know how to safely get the file 使用?subject_ID是确定的,但是我不知道如何安全地获取file

My current regex looks like (.*[\\/]).*\\?_subject_ID 我当前的正则表达式看起来像(.*[\\/]).*\\?_subject_ID

url = '/path/to/file?_subject_ID_SOMEOTHERSTRING'
file_re = re.compile('(.*[\/]).*\?_subject_ID')
file_re.search(url)

this will find the right string, but I still can't extract the file name 这将找到正确的字符串,但我仍然无法提取文件名

printing _.group(1) will get me /path/to/ . 打印_.group(1)将使我/path/to/ What's the next step that gets me the actual file name? 下一步是什么让我获得实际的文件名?

As for your '(.*[\\/]).*\\?_subject_ID' regex approach, you just need to add a capturing group around the second .* . 至于'(.*[\\/]).*\\?_subject_ID'正则表达式方法,您只需要在第二个.*周围添加一个捕获组。 You could use r'(.*/)(.*)\\?_subject_ID' (then, there will be .group(1) and .group(2) parts captured), but it is not the most appropriate way to parse URLs in Python. 您可以使用r'(.*/)(.*)\\?_subject_ID' (然后将捕获.group(1).group(2)部分),但这不是解析URL的最合适方法在Python中。

You may use the non-regex approach here, here is a snippet showing how to leverage urlparse and os.path to parse the URL like yours: 您可以在此处使用非正则表达式方法,下面的代码片段显示了如何利用urlparseos.path像您一样解析URL:

import urlparse
path = urlparse.urlparse('/path/to/file?_subject_ID_SOMEOTHERSTRING').path
import os.path
print(os.path.split(path)[1]) # => file
print(os.path.split(path)[0]) # => /path/to

See the IDEONE demo IDEONE演示

It's pretty simple, really. 确实很简单。 Just match a / before and ?subject_ID after: 只需在/之前匹配,然后在?subject_ID之后匹配:

([^/?]*)\?subject_ID

The [^/?]* (as opposed to .* ) is because otherwise it'd match the part before, too. [^/?]* (而不是.* )是因为否则它也将匹配之前的部分。 The ? ? in the character class 在角色类中

If you want to get both the path and the file, you can do much the same thing, but also grab the part before the / : 如果要获取路径和文件,可以做很多相同的事情,但是也可以在/之前抓取一部分:

([^?]*)([^/?]*)\?subject_ID

It's basically the same as the one before but with the first bit captured instead of ignored. 它基本上与之前的相同,但是捕获的第一位而不是被忽略。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM