PHP正则表达式用于标识特定的URL模式

Question

I have been trying to identify URL patterns of a page. 我一直在尝试识别页面的URL模式。 For which I followed the below but have ended up with a issue 为此我遵循了以下内容，但最终遇到了问题

-> PHP regex used : - >使用PHP正则表达式：

~((https?://)?([-\w]+\.[-\w\.]+)+\w(:\d+)?(/([-\w/_\.]*(\?\S+)?)?)*)~i

This has identified almost all types of URL's like below 这已经确定了几乎所有类型的URL，如下所示

example.com
www.example.com
http://example.com
http://www.example.com    
https://example.com
https://www.example.com

But unfortunately it also considered Decimal values, Price values, Phone no.s, IP address as URL format (may be I have not considered them earlier). 但不幸的是，它还考虑了十进制值，价格值，电话号码，IP地址作为URL格式（可能我之前没有考虑过它们）。 So to fix this I have used to below to find specific numeric valued patterns to be excluded 所以为了解决这个问题，我已经习惯于在下面找到要排除的特定数值模式

/^[0-9]+(\.[0-9]{1,})+\S+\w?$/

Using this has fixed the URL identifier by excluding numeric values like 使用它通过排除数字值来修复URL标识符

Deciaml Values (1.11) Deciaml值（1.11）

IP Address (123.123.123.123) IP地址（123.123.123.123）

Price values ($11.11) 价格（11.11美元）

Now comes the new issue "Abbreviations are also considered as URLs" 现在出现了新的问题“缩写也被视为URL”

WHO (in any alphabetical case) 世卫组织（按字母顺序排列）

So, How can I have an URL Identifying PHP regex which would exclude the above mentioned issue cases ? 那么，我怎样才能有一个URL识别PHP正则表达式，它将排除上述问题？

or 要么

Can I have an PHP regex to identify single alphabet values involving Abbreviations like the above example ? 我是否可以使用PHP正则表达式来识别涉及缩写的单个字母值，如上例所示？

Thanks 谢谢

Answer 1

You may put these exclusions into a negative lookahead and use 您可以将这些排除项置于负面预测中并使用

$re = '~(?x)\b                   # Word boundary
   (?!                           # Exclusion list
     [A-Z](?:\.[A-Z])+\b         # No upper and 1+ sequences of . + an upper
     |                           # or
     \d+(?:\.\d+)+\S+\b          # digits + 1+ dot and digits and 1+ non-whitespaces
   )       
   (?:https?://)?                # Optional http / https protocol part
   (?:[-\w]+\.[-\w.]+)+          # 1+ sequences of 1+ - or word chars, then . and 1+ -, ., or word chars
   \w(?::\d+)?                   # word char and 1 optional sequence of : and 1+ digits
   (?:/(?:[-\w/.]*(?:\?\S+)?)?)* # 0+ sequences of /, 0+ -, word, /, . symbols, then 1 optional sequence of ? and 1+ non-whitespaces
   \b~';                         # word boundary
$str = 'example.com  www.example.com  http://example.com http://www.example.com     https://example.com https://www.example.com  Deciaml Values (1.11)  IP Address (123.123.123.123)   W.H.O   Price values ($11.11)';
preg_match_all($re, $str, $matches);
print_r($matches[0]);

See the PHP demo online, and a regex demo here . 在线查看PHP演示，以及此处的正则表达式演示。

PHP正则表达式用于标识特定的URL模式

问题描述

1 个解决方案

解决方案1
0 已采纳 2017-01-03 10:07:35

PHP正则表达式用于标识特定的URL模式

问题描述

1 个解决方案

解决方案1 0 已采纳 2017-01-03 10:07:35

解决方案1
0 已采纳 2017-01-03 10:07:35