用于提取某些 URL 的正则表达式？

Question

我已经尽力了，但正则表达式并不是我的菜。 :(

我需要提取以某个文件扩展名结尾的某些 URL。 例如，我希望能够解析一个大段落并提取所有以*.txt结尾的 URL。 例如，

Lorem ipsum dolor sit amet, consectetur adipiscing elit。 Nulla hendrerit aliquet erat at ultrices。 Donec eu nunc nec nibh http://www.somesite.com/somefolder/blahblah/etc/something.txt iaculis dictum。 Quisque nisi neque，vulputate quis pellentesque blandit，faucibus eget nisl。

我需要能够从上述段落中删除 http://www.somesite.com/somefolder/blahblah/etc/something.txt但要提取的 URL 数量会有所不同。 它将根据用户输入的内容是动态的。 它可以有 3 个以*.txt结尾的链接和 3 个不以*.txt结尾的链接。 我只需要提取那些以*.txt结尾的内容。 谁能给我我需要的代码？

Answer 1

您可以使用/(?<=\s)http:\/\/\S+\.txt(?=\s)/找到您需要的内容

意思是：

之前的空格/制表符/新行。
http://
不止一个非空格字符。
。文本
之后的空格/制表符/新行。

Answer 2

假设这些都是正确的 URL，那么它们中不会有任何空格。 我们可以利用这一事实使正则表达式变得非常简单：

preg_match_all("/([^ ]+\.(txt|doc))/i", $text, $matches);
//   ([^ ]+     Match anything, except for a space.
//   \.         A normal period.
//   (txt|doc)  The word "txt" or "doc".
//   )/i        Case insensitive (so TXT and TxT also work)

如果您不需要匹配多个文件扩展名，那么您可以将“(txt|doc)”更改为“txt”。

$matches将包含许多 arrays，您需要密钥编号 0 或 1。为了使数组更易于阅读，您可以使用：

preg_match_all("/(?P<matched_urls>[^ ]+\.(txt|doc))/i", $text, $matches);

这将使$matches看起来像这样：

array([0] => array(), [1] => array(), [2] => array(), ["matched_urls"] => array());

应该很明显您需要哪个密钥。

Answer 3

怎么样：

$str = 'Lorem ipsum dolor sit amet. Donec eu nunc nec nibh http://www.somesite.com/somefolder/blahblah/etc/something.txt. Lorem ipsum dolor sit amet. Donec eu nunc nec nibh http://www.somesite.com/somefolder/blahblah/etc/something.doc.';
preg_match_all('#\b(http://\S+\.txt)\b#', $str, $m);

解释：

#             : regex delimiter
\b            : word boundary
(             : begin capture group
http://       : litteral http://
\S+           : one or more non space
\.            : a dot
txt           : litteral txt
)             : end capture group
\b            : word boundary
#             : regex delimiter

用于提取某些 URL 的正则表达式？

问题描述

3 个解决方案

解决方案1
1 已采纳 2011-07-07 08:44:34

解决方案2
0 2011-07-07 08:49:35

解决方案3
0 2011-07-07 09:56:48

用于提取某些 URL 的正则表达式？

问题描述

3 个解决方案

解决方案1 1 已采纳 2011-07-07 08:44:34

解决方案2 0 2011-07-07 08:49:35

解决方案3 0 2011-07-07 09:56:48

解决方案1
1 已采纳 2011-07-07 08:44:34

解决方案2
0 2011-07-07 08:49:35

解决方案3
0 2011-07-07 09:56:48