简体   繁体   English

PHP正则表达式用于标识特定的URL模式

[英]PHP regex to identify specific URL patterns

I have been trying to identify URL patterns of a page. 我一直在尝试识别页面的URL模式。 For which I followed the below but have ended up with a issue 为此我遵循了以下内容,但最终遇到了问题

-> PHP regex used : - >使用PHP正则表达式:

~((https?://)?([-\w]+\.[-\w\.]+)+\w(:\d+)?(/([-\w/_\.]*(\?\S+)?)?)*)~i

This has identified almost all types of URL's like below 这已经确定了几乎所有类型的URL,如下所示

example.com
www.example.com
http://example.com
http://www.example.com    
https://example.com
https://www.example.com

But unfortunately it also considered Decimal values, Price values, Phone no.s, IP address as URL format (may be I have not considered them earlier). 但不幸的是,它还考虑了十进制值,价格值,电话号码,IP地址作为URL格式(可能我之前没有考虑过它们)。 So to fix this I have used to below to find specific numeric valued patterns to be excluded 所以为了解决这个问题,我已经习惯于在下面找到要排除的特定数值模式

/^[0-9]+(\.[0-9]{1,})+\S+\w?$/

Using this has fixed the URL identifier by excluding numeric values like 使用它通过排除数字值来修复URL标识符

Deciaml Values (1.11) Deciaml值(1.11)

IP Address (123.123.123.123) IP地址(123.123.123.123)

Price values ($11.11) 价格(11.11美元)

Now comes the new issue "Abbreviations are also considered as URLs" 现在出现了新的问题“缩写也被视为URL”

WHO (in any alphabetical case) 世卫组织(按字母顺序排列)

So, How can I have an URL Identifying PHP regex which would exclude the above mentioned issue cases ? 那么,我怎样才能有一个URL识别PHP正则表达式,它将排除上述问题?

or 要么

Can I have an PHP regex to identify single alphabet values involving Abbreviations like the above example ? 我是否可以使用PHP正则表达式来识别涉及缩写的单个字母值,如上例所示?

Thanks 谢谢

You may put these exclusions into a negative lookahead and use 您可以将这些排除项置于负面预测中并使用

$re = '~(?x)\b                   # Word boundary
   (?!                           # Exclusion list
     [A-Z](?:\.[A-Z])+\b         # No upper and 1+ sequences of . + an upper
     |                           # or
     \d+(?:\.\d+)+\S+\b          # digits + 1+ dot and digits and 1+ non-whitespaces
   )       
   (?:https?://)?                # Optional http / https protocol part
   (?:[-\w]+\.[-\w.]+)+          # 1+ sequences of 1+ - or word chars, then . and 1+ -, ., or word chars
   \w(?::\d+)?                   # word char and 1 optional sequence of : and 1+ digits
   (?:/(?:[-\w/.]*(?:\?\S+)?)?)* # 0+ sequences of /, 0+ -, word, /, . symbols, then 1 optional sequence of ? and 1+ non-whitespaces
   \b~';                         # word boundary
$str = 'example.com  www.example.com  http://example.com http://www.example.com     https://example.com https://www.example.com  Deciaml Values (1.11)  IP Address (123.123.123.123)   W.H.O   Price values ($11.11)';
preg_match_all($re, $str, $matches);
print_r($matches[0]);

See the PHP demo online, and a regex demo here . 在线查看PHP演示 ,以及此处正则表达式演示

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM