网址的正则表达式模式，没有结尾斜杠，并排除网址中的某些文本

Question

I'm looking for preg_match_all pattern to find all URL on a page that don't have trailing slash. 我正在寻找preg_match_all模式以查找页面上没有尾部斜杠的所有URL。

For example: if I have 例如：如果我有

a href="/testing/abc/">end with slash

a href="/testing/test/mnl">no ending slash

The result would be #2. 结果将是＃2。 Solution is posted at find pattern for url with no ending slash 解决方案发布在网址的查找模式中，没有结尾斜杠

I have tried to modify the provided pattern to exclude urls that have 'images' or '.pdf' but no luck yet. 我试图修改提供的模式以排除具有“图像”或“ .pdf”但没有运气的网址。

Thanks. 谢谢。

Answer 1

This one should suit your needs ( demo ): 这应该适合您的需求（演示）：

href="(?:(?<!images).(?!(?:[.]pdf|/)"))*?"

(?:) = non-capturing groupe (?:) =非捕获组
(?<!images). = any char not preceded by images =任何不带images字符
.(?!(?:[.]pdf|/)") = any char not followed by .pdf" nor by /" .(?!(?:[.]pdf|/)") =任何不带.pdf"或/"字符
*? = match as short as possible = 匹配越短越好

Answer 2

I found a way to exclude a link that has .pdf, by modifying the provided answer from the other question. 我找到了一种方法，可以通过修改其他问题中提供的答案来排除具有.pdf的链接。 Still looking at why it won't not match the images example though. 仍在查看为什么它与图片示例不匹配。

href=(['"])[^\s]+(?<![\/]|.pdf)\1

Link to a working test http://www.rubular.com/r/jmBVstpGZD 链接到工作测试http://www.rubular.com/r/jmBVstpGZD

网址的正则表达式模式，没有结尾斜杠，并排除网址中的某些文本

问题描述

2 个解决方案

解决方案1
2 已采纳 2013-03-19 17:03:26

解决方案2
1 2013-03-19 17:01:30

网址的正则表达式模式，没有结尾斜杠，并排除网址中的某些文本

问题描述

2 个解决方案

解决方案1 2 已采纳 2013-03-19 17:03:26

解决方案2 1 2013-03-19 17:01:30

解决方案1
2 已采纳 2013-03-19 17:03:26

解决方案2
1 2013-03-19 17:01:30