使用正则表达式来抓取两个特定字符之间的所有文本

Question

I have a url that contains a filename. 我有一个包含文件名的网址。 I would like to create a function that uses a regular expression to isolate a file name and then save it as a variable. 我想创建一个使用正则表达式来隔离文件名，然后将其另存为变量的函数。 Setting up the function, and saving the string as a variable is fairly straight forward. 设置函数并将字符串保存为变量非常简单。 I am struggling with regular expression to isolate the string. 我正在努力用正则表达式来隔离字符串。

Below is an example of a url that I am working with. 以下是我正在使用的网址的示例。

http://some-website.s3.amazonaws.com/lovecraft-05.epub?AWSAccessKeyId=KJHFHGFDSXF&Expires=3568732&Signature=%3JHF%3KUHF%2Bnuvnu%5LHF%3D http://some-website.s3.amazonaws.com/lovecraft-05.epub?AWSAccessKeyId=KJHFHGFDSXF&Expires=3568732&Signature=%3JHF%3KUHF%2Bnuvnu%5LHF%3D

I would like to grab the filename located in between "/" and "?" 我想获取位于“ /”和“？”之间的文件名。

So the value I am looking for is "lovecraft-05.epub" 所以我要寻找的值是“ lovecraft-05.epub”

Answer 1

Text 文本

http://some-website.s3.amazonaws.com/lovecraft-05.epub?AWSAccessKeyId=KJHFHGFDSXF&Expires=3568732&Signature=%3JHF%3KUHF%2Bnuvnu%5LHF%3D

Regex (with Perl): 正则表达式（与Perl）：

\.com\/(.*)\?

Output 产量

Match 1:    .com/lovecraft-05.epub?     32      23
Group 1:    lovecraft-05.epub       37      17

Answer 2

This regex selects substring after string amazonaws.com and before ? 此正则表达式在字符串amazonaws.com和之前选择子字符串? character: 字符：

amazonaws.com\/([^\?]+)

When coding you need to find group(1) match. 编码时，您需要找到group(1)匹配项。
See DEMO for explanation. 有关说明，请参见DEMO 。

Answer 3

You can use /\\/([^\\/?]+)\\?/ : 您可以使用/\\/([^\\/?]+)\\?/ ：

The perl one-liner Perl单线

echo "http://some-website.s3.amazonaws.com/lovecraft-05.epub?AWS?AccessKeyId=KJHFHGFDSXF&Expires=3568732&Signature=%3JHF%3KUHF%2Bnuvnu%5LHF%3D" \
| perl -ne 'print $1 if m=/([^/?]+)\?='

returns lovecraft-05.epub0 . 返回lovecraft-05.epub0 。

Answer 4

I see two ways to do that: 我看到两种方法可以做到这一点：

function get_filename_from_url($url) {
    return ltrim(strrchr(parse_url($url, PHP_URL_PATH), '/'), '/');
}

or with preg_match : 或与preg_match ：

function get_filename_from_url($url) {
    return preg_match('~(?<!:/)/\K[^/]*?(?=[?#]|$)~', $url, $m) ? $m[0] : '';
}

where the pattern means: 该模式的含义是：

~           # pattern delimiter
(?<!:/)     # not preceded by :/
/           # literal slash
\K          # discard character(s) on the left from the match result
[^/]*?      # zero or more characters that are not a slash
(?=[?#]|$)  # followed by a ? or a # or the end of the string
~

Note that I have choosen to return the empty string by default when the url isn't well formatted, obviously you can choose a different behaviour. 请注意，当URL格式不正确时，我选择默认情况下返回空字符串，显然，您可以选择其他行为。

In the regex way, testing # or the end of the string in addition of the question mark is needed since the query part of an url may be optional. 以正则表达式的方式，由于URL的查询部分可能是可选的，因此需要测试#或除问号之外的字符串结尾。 If the query part is not here, the filename can be followed by the fragment part or the end of the string. 如果查询部分不在此处，则文件名之后可以是片段部分或字符串的结尾。

使用正则表达式来抓取两个特定字符之间的所有文本

问题描述

4 个解决方案

解决方案1
0 2015-06-29 22:41:49

解决方案2
0 2015-06-29 22:42:42

解决方案3
0 2015-06-29 22:50:38

解决方案4
0 2015-06-29 23:08:42

使用正则表达式来抓取两个特定字符之间的所有文本

问题描述

4 个解决方案

解决方案1 0 2015-06-29 22:41:49

解决方案2 0 2015-06-29 22:42:42

解决方案3 0 2015-06-29 22:50:38

解决方案4 0 2015-06-29 23:08:42

解决方案1
0 2015-06-29 22:41:49

解决方案2
0 2015-06-29 22:42:42

解决方案3
0 2015-06-29 22:50:38

解决方案4
0 2015-06-29 23:08:42