[英]Matching specific regex words in url
I must admit I've never gotten used to using regex, however recently I ran into a problem where the work around would've been more of a pain than using regex. 我必须承认,我从未习惯过使用正则表达式,但是最近我遇到了一个问题,即解决问题比使用正则表达式更痛苦。 I need to be able to match anything that follows the following pattern at the beginning of a string: {any_url_safe_word}
+( "/http://"
|| "/https://"
|| "www."
) + {any word}
. 我需要能够在字符串的开头匹配遵循以下模式的所有内容: {any_url_safe_word}
+( "/http://"
|| "/https://"
|| "www."
)+ {any word}
。 So the following should match: 因此,以下应匹配:
cars/http://google.com#test
cars/https://google.com#test
cars/www.google.com#test
The follwing shouldn't match: 跟随者不应匹配:
cars/httdp://google.com#test
cars/http:/google.com#test
What I tried so far is: ^[\\w]{1,500}\\/[(http\\:\\/\\/)|(https:\\/\\/])|([www\\.])]{0,50}
, but that matches cars/http
from cars/httpd://google.com
. 到目前为止,我尝试过的是: ^[\\w]{1,500}\\/[(http\\:\\/\\/)|(https:\\/\\/])|([www\\.])]{0,50}
,但与cars/httpd://google.com
cars/http
匹配。
This regex could do: 这个正则表达式可以做到:
^[\w\d]+\/(?:https?:\/\/)?(?:www\.)?[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}
And if you want to get everything that comes after it, you can just add (.*)
to the end... 而且,如果您想获得其后的所有内容,则只需在最后添加(.*)
...
And since it seems that the more or less general list of URL safe words contains ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-._~:/?#[]@!$&'()*+,;=
Source
, you may include that too, so you'll get (after simplification): 而且,由于似乎URL安全字的一般列表似乎包含ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-._~:/?#[]@!$&'()*+,;=
Source
ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-._~:/?#[]@!$&'()*+,;=
,所以您也可以包括该名称,因此, ll得到(简化后):
^[!#$&-.0-;=?-\[\]_a-z~]+\/(?:https?:\/\/)?(?:www\.)?[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}
<?php
$words = array(
'cars/http://google.com#test',
'cars/https://google.com#test',
'cars/www.google.com#test',
'cars/httdp://google.com#test',
'cars/http:/google.com#test',
'c a r s/http:/google.com#test'
);
foreach($words as $value)
{
/*
\S+ - at least one non-space symbol
\/ - slash
(https?:\/\/) - http with possible s then ://
| - or
(www\.) - www.
.+ - at least one symbol
*/
if (preg_match('/^\S+\/(https?:\/\/)|(www\.).+/', $value))
{
print $value. " good\n";
}
else
{
print $value. " bad\n";
}
}
Prints: 打印:
cars/http://google.com#test good
cars/https://google.com#test good
cars/www.google.com#test good
cars/httdp://google.com#test bad
cars/http:/google.com#test bad
c a r s/http:/google.com#test bad
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.