[英]Regular expression to match only parent pages in URL
我有一组这样的网址:
这将不匹配:
https://example.com/parent/child.html
这些将匹配:
https://example.com/parent.html
https://example.com/parent.html/page/page-number
https://example.com/anything/page/page-number
https://example.com/anything/sub-anything
https://example.com/anything/sub-anything/page/page-number
我已经搜索了很多,但没有解决方案。 我试过这个,但它没有按预期工作:
/^(https:\/\/example\.com\/[^/]+\.html|https:\/\/example\.com\/[^/]+\.html\/(.+?)|https:\/\/example\.com\/anything\/[^/]+)$/
'parent', 'child', 'anything', 'sub-anything' 只包含单词、数字、-、%
“页码”只是数字
在这种情况下,什么是好的正则表达式?
非常感谢。
编辑:将\\w
更改为[\\w\\d-]
以允许数字和破折号
这是一个非常懒惰的正则表达式,可以正确匹配您的测试用例,但除此之外可能不一定可用。 如果您想吸引更高质量的答案,我建议添加更多负面测试用例的示例。
https?:\/\/[\w%-]++(?:\.com)?(?(?=(\/[\w%-]+\/)[\w%-]+\.html)(?!)|.*)
如果父母的深度可以大于 1,例如: https://example.com/parent/parent2/child.html
: https://example.com/parent/parent2/child.html
并且您仍然不希望它匹配,那么以下应该可以解决问题:
https?:\/\/[\w%-]++(?:\.com)?+(?(?=(?:\/[\w%-]+)+\/[\w%-]+\.html)(?!)|.*)
对后者的解释如下:
https? match "http" or "https"
:\/\/ match "://"
[\w%-]++ match any letters, numbers, '%', or '-'; don't allow backtracking (possessive)
(?:\.com)?+ match .com once if it's there, don't allow backtracking, don't store in capture group
(?(?=...) if our positive lookahead matches
(?:\/[\w%-]+)+ one or more groups of letter/number/'%'/'-' with a leading forward slash
\/[\w%-]+\.html followed be another forward slash, some letters/numbers/'%'/'-', then '.html'
(?!) fail the match
| else
.*) match whatever is left
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.