I have this regular expression:
preg_match_all("/<a\\s.*?href\\s*=\\s*['|\\"](.*?)(?=#|\\"|')/si", $data, $matches);
to find all urls, it works fine, BUT how can I modificate it to find urls with question marks ONLY?
Example:
<a href="http://site.com/index.php">0</a><a href="http://site.com/index.php?id=1">1</a><a href="http://site.com/calc/index.php?id=1&scheme=Venus">2</a><a href="http://site.com/catalogue/data.php">3</a>
And preg_match_all
will return:
http://site.com/index.php?id=1
http://site.com/calc/index.php?id=1&scheme=Venus
preg_match_all("@<a\s*href\s*=[\'\"]([^\'\"]+\?[^\'\"]+)[\'\"]@si", $data, $matches);
尝试这个。
Don't try to make everything happen in one regex. Use your existing method, and then separately check the URL that you get back to see if it has a question mark in it.
That said, don't use regular expressions to parse HTML . You cannot reliably parse HTML with regular expressions, and you will face sorrow and frustration down the road. As soon as the HTML changes from your expectations, your code will be broken. See http://htmlparsing.com/php for examples of how to properly parse HTML with PHP modules that have already been written, tested and debugged.
Andy Lester gave you the answer with right thing to do.
Here's your regex though:
<a\s.*?href\s*=\s*['|\"](.*?\?.*?)(?=#|\"|')
as seen here:
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.