简体   繁体   English

网址的正则表达式模式,没有结尾斜杠,并排除网址中的某些文本

[英]regex pattern for url with no ending slash and exclude certain text in url

I'm looking for preg_match_all pattern to find all URL on a page that don't have trailing slash. 我正在寻找preg_match_all模式以查找页面上没有尾部斜杠的所有URL。

For example: if I have 例如:如果我有

a href="/testing/abc/">end with slash

a href="/testing/test/mnl">no ending slash

The result would be #2. 结果将是#2。 Solution is posted at find pattern for url with no ending slash 解决方案发布在网址的查找模式中,没有结尾斜杠

I have tried to modify the provided pattern to exclude urls that have 'images' or '.pdf' but no luck yet. 我试图修改提供的模式以排除具有“图像”或“ .pdf”但没有运气的网址。

Thanks. 谢谢。

This one should suit your needs ( demo ): 这应该适合您的需求( 演示 ):

href="(?:(?<!images).(?!(?:[.]pdf|/)"))*?"
  • (?:) = non-capturing groupe (?:) =非捕获组
  • (?<!images). = any char not preceded by images =任何不带images字符
  • .(?!(?:[.]pdf|/)") = any char not followed by .pdf" nor by /" .(?!(?:[.]pdf|/)") =任何不带.pdf"/"字符
  • *? = match as short as possible = 匹配越短越好

I found a way to exclude a link that has .pdf, by modifying the provided answer from the other question. 我找到了一种方法,可以通过修改其他问题中提供的答案来排除具有.pdf的链接。 Still looking at why it won't not match the images example though. 仍在查看为什么它与图片示例不匹配。

href=(['"])[^\s]+(?<![\/]|.pdf)\1

Link to a working test http://www.rubular.com/r/jmBVstpGZD 链接到工作测试http://www.rubular.com/r/jmBVstpGZD

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM