Using a Regular Expression to Grab all text in between two specific characters

Question

I have a url that contains a filename. I would like to create a function that uses a regular expression to isolate a file name and then save it as a variable. Setting up the function, and saving the string as a variable is fairly straight forward. I am struggling with regular expression to isolate the string.

Below is an example of a url that I am working with.

http://some-website.s3.amazonaws.com/lovecraft-05.epub?AWSAccessKeyId=KJHFHGFDSXF&Expires=3568732&Signature=%3JHF%3KUHF%2Bnuvnu%5LHF%3D

I would like to grab the filename located in between "/" and "?"

So the value I am looking for is "lovecraft-05.epub"

Answer 1

Text

http://some-website.s3.amazonaws.com/lovecraft-05.epub?AWSAccessKeyId=KJHFHGFDSXF&Expires=3568732&Signature=%3JHF%3KUHF%2Bnuvnu%5LHF%3D

Regex (with Perl):

\.com\/(.*)\?

Output

Match 1:    .com/lovecraft-05.epub?     32      23
Group 1:    lovecraft-05.epub       37      17

Answer 2

This regex selects substring after string amazonaws.com and before ? character:

amazonaws.com\/([^\?]+)

When coding you need to find group(1) match.
See DEMO for explanation.

Answer 3

You can use /\\/([^\\/?]+)\\?/ :

The perl one-liner

echo "http://some-website.s3.amazonaws.com/lovecraft-05.epub?AWS?AccessKeyId=KJHFHGFDSXF&Expires=3568732&Signature=%3JHF%3KUHF%2Bnuvnu%5LHF%3D" \
| perl -ne 'print $1 if m=/([^/?]+)\?='

returns lovecraft-05.epub0 .

Answer 4

I see two ways to do that:

function get_filename_from_url($url) {
    return ltrim(strrchr(parse_url($url, PHP_URL_PATH), '/'), '/');
}

or with preg_match :

function get_filename_from_url($url) {
    return preg_match('~(?<!:/)/\K[^/]*?(?=[?#]|$)~', $url, $m) ? $m[0] : '';
}

where the pattern means:

~           # pattern delimiter
(?<!:/)     # not preceded by :/
/           # literal slash
\K          # discard character(s) on the left from the match result
[^/]*?      # zero or more characters that are not a slash
(?=[?#]|$)  # followed by a ? or a # or the end of the string
~

Note that I have choosen to return the empty string by default when the url isn't well formatted, obviously you can choose a different behaviour.

In the regex way, testing # or the end of the string in addition of the question mark is needed since the query part of an url may be optional. If the query part is not here, the filename can be followed by the fragment part or the end of the string.

Using a Regular Expression to Grab all text in between two specific characters

Question

4 answers

solution1
0 2015-06-29 22:41:49

solution2
0 2015-06-29 22:42:42

solution3
0 2015-06-29 22:50:38

solution4
0 2015-06-29 23:08:42

Using a Regular Expression to Grab all text in between two specific characters

Question

4 answers

solution1 0 2015-06-29 22:41:49

solution2 0 2015-06-29 22:42:42

solution3 0 2015-06-29 22:50:38

solution4 0 2015-06-29 23:08:42

solution1
0 2015-06-29 22:41:49

solution2
0 2015-06-29 22:42:42

solution3
0 2015-06-29 22:50:38

solution4
0 2015-06-29 23:08:42