简体   繁体   中英

Simple regex seems to cause infinite loop in PHP

The following 2 lines are my code:

$rank_content = file_get_contents('https://www.championsofregnum.com/index.php?l=1&ref=gmg&sec=42&world=2');
$tmp_ = preg_replace("/.+width=.16.> /Uis", "", $rank_content, 1);

The second line above causes an infinite loop. In contrary, the following alternatives DO work:

$tmp_ = preg_replace("/.+width=.16.> /Ui", "", $rank_content, 1);
$tmp_ = preg_replace("/[^§]+width=.16.> /Uis", "", $rank_content, 1);

But sadly, they do not give me what I want - both alternatives do not include line breaks within $rank_content .

Also, if I replaced the file_get_contents function with something like

$rank_content = "asdfas\nasdfasdfaswidth=m16m> teststring";

There are no problems either, although \\n represents a line break, too, doesn't it?!

So do I understand it right that RegEx has problems in noticing a String with line breaks in it?

How can I filter a substring of $rank_content (which has multiple lines in it) by removing some lines until something like "width="16" " appears? (Can be seen in the site's source code)

Replace the m modifier with the s modifier. m changes the behaviour of ^ and $ , whereas s changes the behaviour of .

That said, you should not be parsing HTML with regex. Seriously. Bad things happen .

I give up on it: It seems the problem is the LENGTH of the haystack variable $rank_content. Its length is about 90,000, while the maximum allowed length for regex match() is about 30,000, so I guess it is the same for regex replace(). Solving this problem would surely be possible, if somebody is interested: Have a look into this link -> PHP preg_match_all limit

I myself am going to solve the problem using another method for reading the contents of a website like HTML Unit or maybe retrieving the site line after line.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM