简体   繁体   中英

PHP Regular expression: Get all urls with question mark

I have this regular expression:

preg_match_all("/<a\\s.*?href\\s*=\\s*['|\\"](.*?)(?=#|\\"|')/si", $data, $matches);

to find all urls, it works fine, BUT how can I modificate it to find urls with question marks ONLY?

Example:

<a href="http://site.com/index.php">0</a><a href="http://site.com/index.php?id=1">1</a><a href="http://site.com/calc/index.php?id=1&scheme=Venus">2</a><a href="http://site.com/catalogue/data.php">3</a>

And preg_match_all will return:

http://site.com/index.php?id=1

http://site.com/calc/index.php?id=1&scheme=Venus

preg_match_all("@<a\s*href\s*=[\'\"]([^\'\"]+\?[^\'\"]+)[\'\"]@si", $data, $matches);

尝试这个。

Don't try to make everything happen in one regex. Use your existing method, and then separately check the URL that you get back to see if it has a question mark in it.

That said, don't use regular expressions to parse HTML . You cannot reliably parse HTML with regular expressions, and you will face sorrow and frustration down the road. As soon as the HTML changes from your expectations, your code will be broken. See http://htmlparsing.com/php for examples of how to properly parse HTML with PHP modules that have already been written, tested and debugged.

Andy Lester gave you the answer with right thing to do.

Here's your regex though:

<a\s.*?href\s*=\s*['|\"](.*?\?.*?)(?=#|\"|')

as seen here:

http://rubular.com/r/LHi11VMMR9

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM