I have a url that contains a filename. I would like to create a function that uses a regular expression to isolate a file name and then save it as a variable. Setting up the function, and saving the string as a variable is fairly straight forward. I am struggling with regular expression to isolate the string.
Below is an example of a url that I am working with.
I would like to grab the filename located in between "/" and "?"
So the value I am looking for is "lovecraft-05.epub"
Text
http://some-website.s3.amazonaws.com/lovecraft-05.epub?AWSAccessKeyId=KJHFHGFDSXF&Expires=3568732&Signature=%3JHF%3KUHF%2Bnuvnu%5LHF%3D
Regex (with Perl):
\.com\/(.*)\?
Output
Match 1: .com/lovecraft-05.epub? 32 23
Group 1: lovecraft-05.epub 37 17
This regex selects substring after string amazonaws.com
and before ?
character:
amazonaws.com\/([^\?]+)
When coding you need to find group(1)
match.
See DEMO for explanation.
You can use /\\/([^\\/?]+)\\?/
:
The perl one-liner
echo "http://some-website.s3.amazonaws.com/lovecraft-05.epub?AWS?AccessKeyId=KJHFHGFDSXF&Expires=3568732&Signature=%3JHF%3KUHF%2Bnuvnu%5LHF%3D" \
| perl -ne 'print $1 if m=/([^/?]+)\?='
returns lovecraft-05.epub0
.
I see two ways to do that:
function get_filename_from_url($url) {
return ltrim(strrchr(parse_url($url, PHP_URL_PATH), '/'), '/');
}
or with preg_match
:
function get_filename_from_url($url) {
return preg_match('~(?<!:/)/\K[^/]*?(?=[?#]|$)~', $url, $m) ? $m[0] : '';
}
where the pattern means:
~ # pattern delimiter
(?<!:/) # not preceded by :/
/ # literal slash
\K # discard character(s) on the left from the match result
[^/]*? # zero or more characters that are not a slash
(?=[?#]|$) # followed by a ? or a # or the end of the string
~
Note that I have choosen to return the empty string by default when the url isn't well formatted, obviously you can choose a different behaviour.
In the regex way, testing #
or the end of the string in addition of the question mark is needed since the query part of an url may be optional. If the query part is not here, the filename can be followed by the fragment part or the end of the string.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.