[英]Using a Regular Expression to Grab all text in between two specific characters
I have a url that contains a filename. 我有一个包含文件名的网址。 I would like to create a function that uses a regular expression to isolate a file name and then save it as a variable.
我想创建一个使用正则表达式来隔离文件名,然后将其另存为变量的函数。 Setting up the function, and saving the string as a variable is fairly straight forward.
设置函数并将字符串保存为变量非常简单。 I am struggling with regular expression to isolate the string.
我正在努力用正则表达式来隔离字符串。
Below is an example of a url that I am working with. 以下是我正在使用的网址的示例。
http://some-website.s3.amazonaws.com/lovecraft-05.epub?AWSAccessKeyId=KJHFHGFDSXF&Expires=3568732&Signature=%3JHF%3KUHF%2Bnuvnu%5LHF%3D http://some-website.s3.amazonaws.com/lovecraft-05.epub?AWSAccessKeyId=KJHFHGFDSXF&Expires=3568732&Signature=%3JHF%3KUHF%2Bnuvnu%5LHF%3D
I would like to grab the filename located in between "/" and "?" 我想获取位于“ /”和“?”之间的文件名。
So the value I am looking for is "lovecraft-05.epub" 所以我要寻找的值是“ lovecraft-05.epub”
Text 文本
http://some-website.s3.amazonaws.com/lovecraft-05.epub?AWSAccessKeyId=KJHFHGFDSXF&Expires=3568732&Signature=%3JHF%3KUHF%2Bnuvnu%5LHF%3D
Regex (with Perl): 正则表达式(与Perl):
\.com\/(.*)\?
Output 产量
Match 1: .com/lovecraft-05.epub? 32 23
Group 1: lovecraft-05.epub 37 17
You can use /\\/([^\\/?]+)\\?/
: 您可以使用
/\\/([^\\/?]+)\\?/
:
The perl one-liner Perl单线
echo "http://some-website.s3.amazonaws.com/lovecraft-05.epub?AWS?AccessKeyId=KJHFHGFDSXF&Expires=3568732&Signature=%3JHF%3KUHF%2Bnuvnu%5LHF%3D" \
| perl -ne 'print $1 if m=/([^/?]+)\?='
returns lovecraft-05.epub0
. 返回
lovecraft-05.epub0
。
I see two ways to do that: 我看到两种方法可以做到这一点:
function get_filename_from_url($url) {
return ltrim(strrchr(parse_url($url, PHP_URL_PATH), '/'), '/');
}
or with preg_match
: 或与
preg_match
:
function get_filename_from_url($url) {
return preg_match('~(?<!:/)/\K[^/]*?(?=[?#]|$)~', $url, $m) ? $m[0] : '';
}
where the pattern means: 该模式的含义是:
~ # pattern delimiter
(?<!:/) # not preceded by :/
/ # literal slash
\K # discard character(s) on the left from the match result
[^/]*? # zero or more characters that are not a slash
(?=[?#]|$) # followed by a ? or a # or the end of the string
~
Note that I have choosen to return the empty string by default when the url isn't well formatted, obviously you can choose a different behaviour. 请注意,当URL格式不正确时,我选择默认情况下返回空字符串,显然,您可以选择其他行为。
In the regex way, testing #
or the end of the string in addition of the question mark is needed since the query part of an url may be optional. 以正则表达式的方式,由于URL的查询部分可能是可选的,因此需要测试
#
或除问号之外的字符串结尾。 If the query part is not here, the filename can be followed by the fragment part or the end of the string. 如果查询部分不在此处,则文件名之后可以是片段部分或字符串的结尾。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.