简体   繁体   中英

Using a Regular Expression to Grab all text in between two specific characters

I have a url that contains a filename. I would like to create a function that uses a regular expression to isolate a file name and then save it as a variable. Setting up the function, and saving the string as a variable is fairly straight forward. I am struggling with regular expression to isolate the string.

Below is an example of a url that I am working with.

http://some-website.s3.amazonaws.com/lovecraft-05.epub?AWSAccessKeyId=KJHFHGFDSXF&Expires=3568732&Signature=%3JHF%3KUHF%2Bnuvnu%5LHF%3D

I would like to grab the filename located in between "/" and "?"

So the value I am looking for is "lovecraft-05.epub"

Text

http://some-website.s3.amazonaws.com/lovecraft-05.epub?AWSAccessKeyId=KJHFHGFDSXF&Expires=3568732&Signature=%3JHF%3KUHF%2Bnuvnu%5LHF%3D

Regex (with Perl):

\.com\/(.*)\?

Output

Match 1:    .com/lovecraft-05.epub?     32      23
Group 1:    lovecraft-05.epub       37      17

This regex selects substring after string amazonaws.com and before ? character:

amazonaws.com\/([^\?]+)

When coding you need to find group(1) match.
See DEMO for explanation.

You can use /\\/([^\\/?]+)\\?/ :

The perl one-liner

echo "http://some-website.s3.amazonaws.com/lovecraft-05.epub?AWS?AccessKeyId=KJHFHGFDSXF&Expires=3568732&Signature=%3JHF%3KUHF%2Bnuvnu%5LHF%3D" \
| perl -ne 'print $1 if m=/([^/?]+)\?='

returns lovecraft-05.epub0 .

I see two ways to do that:

function get_filename_from_url($url) {
    return ltrim(strrchr(parse_url($url, PHP_URL_PATH), '/'), '/');
}

or with preg_match :

function get_filename_from_url($url) {
    return preg_match('~(?<!:/)/\K[^/]*?(?=[?#]|$)~', $url, $m) ? $m[0] : '';
}

where the pattern means:

~           # pattern delimiter
(?<!:/)     # not preceded by :/
/           # literal slash
\K          # discard character(s) on the left from the match result
[^/]*?      # zero or more characters that are not a slash
(?=[?#]|$)  # followed by a ? or a # or the end of the string
~

Note that I have choosen to return the empty string by default when the url isn't well formatted, obviously you can choose a different behaviour.

In the regex way, testing # or the end of the string in addition of the question mark is needed since the query part of an url may be optional. If the query part is not here, the filename can be followed by the fragment part or the end of the string.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM