[英]PHP regex to identify specific URL patterns
I have been trying to identify URL patterns of a page. 我一直在尝试识别页面的URL模式。 For which I followed the below but have ended up with a issue
为此我遵循了以下内容,但最终遇到了问题
-> PHP regex used : - >使用PHP正则表达式:
~((https?://)?([-\w]+\.[-\w\.]+)+\w(:\d+)?(/([-\w/_\.]*(\?\S+)?)?)*)~i
This has identified almost all types of URL's like below 这已经确定了几乎所有类型的URL,如下所示
example.com
www.example.com
http://example.com
http://www.example.com
https://example.com
https://www.example.com
But unfortunately it also considered Decimal values, Price values, Phone no.s, IP address as URL format (may be I have not considered them earlier). 但不幸的是,它还考虑了十进制值,价格值,电话号码,IP地址作为URL格式(可能我之前没有考虑过它们)。 So to fix this I have used to below to find specific numeric valued patterns to be excluded
所以为了解决这个问题,我已经习惯于在下面找到要排除的特定数值模式
/^[0-9]+(\.[0-9]{1,})+\S+\w?$/
Using this has fixed the URL identifier by excluding numeric values like 使用它通过排除数字值来修复URL标识符
Deciaml Values (1.11) Deciaml值(1.11)
IP Address (123.123.123.123) IP地址(123.123.123.123)
Price values ($11.11) 价格(11.11美元)
Now comes the new issue "Abbreviations are also considered as URLs" 现在出现了新的问题“缩写也被视为URL”
WHO (in any alphabetical case) 世卫组织(按字母顺序排列)
So, How can I have an URL Identifying PHP regex which would exclude the above mentioned issue cases ? 那么,我怎样才能有一个URL识别PHP正则表达式,它将排除上述问题?
or 要么
Can I have an PHP regex to identify single alphabet values involving Abbreviations like the above example ? 我是否可以使用PHP正则表达式来识别涉及缩写的单个字母值,如上例所示?
Thanks 谢谢
You may put these exclusions into a negative lookahead and use 您可以将这些排除项置于负面预测中并使用
$re = '~(?x)\b # Word boundary
(?! # Exclusion list
[A-Z](?:\.[A-Z])+\b # No upper and 1+ sequences of . + an upper
| # or
\d+(?:\.\d+)+\S+\b # digits + 1+ dot and digits and 1+ non-whitespaces
)
(?:https?://)? # Optional http / https protocol part
(?:[-\w]+\.[-\w.]+)+ # 1+ sequences of 1+ - or word chars, then . and 1+ -, ., or word chars
\w(?::\d+)? # word char and 1 optional sequence of : and 1+ digits
(?:/(?:[-\w/.]*(?:\?\S+)?)?)* # 0+ sequences of /, 0+ -, word, /, . symbols, then 1 optional sequence of ? and 1+ non-whitespaces
\b~'; # word boundary
$str = 'example.com www.example.com http://example.com http://www.example.com https://example.com https://www.example.com Deciaml Values (1.11) IP Address (123.123.123.123) W.H.O Price values ($11.11)';
preg_match_all($re, $str, $matches);
print_r($matches[0]);
See the PHP demo online, and a regex demo here . 在线查看PHP演示 ,以及此处的正则表达式演示 。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.