[英]regex pattern for url with no ending slash and exclude certain text in url
I'm looking for preg_match_all pattern to find all URL on a page that don't have trailing slash. 我正在寻找preg_match_all模式以查找页面上没有尾部斜杠的所有URL。
For example: if I have 例如:如果我有
a href="/testing/abc/">end with slash
a href="/testing/test/mnl">no ending slash
The result would be #2. 结果将是#2。 Solution is posted at find pattern for url with no ending slash
解决方案发布在网址的查找模式中,没有结尾斜杠
I have tried to modify the provided pattern to exclude urls that have 'images' or '.pdf' but no luck yet. 我试图修改提供的模式以排除具有“图像”或“ .pdf”但没有运气的网址。
Thanks. 谢谢。
This one should suit your needs ( demo ): 这应该适合您的需求( 演示 ):
href="(?:(?<!images).(?!(?:[.]pdf|/)"))*?"
(?:)
= non-capturing groupe (?:)
=非捕获组 (?<!images).
= any char not preceded by images
images
字符 .(?!(?:[.]pdf|/)")
= any char not followed by .pdf"
nor by /"
.(?!(?:[.]pdf|/)")
=任何不带.pdf"
或/"
字符 *?
= match as short as possible I found a way to exclude a link that has .pdf, by modifying the provided answer from the other question. 我找到了一种方法,可以通过修改其他问题中提供的答案来排除具有.pdf的链接。 Still looking at why it won't not match the images example though.
仍在查看为什么它与图片示例不匹配。
href=(['"])[^\s]+(?<![\/]|.pdf)\1
Link to a working test http://www.rubular.com/r/jmBVstpGZD 链接到工作测试http://www.rubular.com/r/jmBVstpGZD
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.