I am trying to figure out a regular expression for the following:
<tr class="A">.*</tr><tr class="(B|C)">.*</tr>
Now The second tr class will repeat an unknown number of times, with something unknown in between repetitions, but simply putting it in parentheses and added a plus doesn't work.
Here's the PHP code that didn't work:
$pattern = '/<tr\ class=\"A\">.*(<tr\ class=\"(B|C)\">.*<\/tr>.*)+/';
preg_match_all($pattern,$playerHtml,$scores);
But it only returns the first
Here's an example of something that should match:
<tr class="A">blah</tr>blah
<tr class="B">blah</tr>blah
<tr class="B">blah</tr>blah
<tr class="C">blah</tr>
This only matches blahblahblah
For your particular example, this regex will do:
/<tr class="A">.*?<\/tr>.*\n?(<tr class="[BC]">.*?<\/tr>.*\n?)+/
Hope you can tweak it if need be. See the codepad demo here .
I needed to include \\n
newline characters for it to work.
Because they are TR elements outside of TABLE elements, I had a hard time seeing the result of the preg_match_all function (because my browser immediately stripped the random TR elements). You may have had similar problems. I used htmlspecialchars() in the demo to output the regex match.
Also , it's improper to have text between two TR elements:
<tr></tr>blah<tr></tr>
So you should be careful about doing that.
Try:
<tr class="A">.*</tr><tr class="((B|C)\s*)+">.*</tr>
+
indicates one or more times and *
indicates 0 or more times. Also \\s
inidcates a white space.
((B|C)\\s*)+
means there will be one or more blocks of (B|C)\\s*
(B|C)\\s*
means there will be a string starts with B
or C
then some whitespaces may be followed.
由于我在用手机,所以我无法测试它,但是使用这种模式,您在$ scores中得到什么?
<tr class="A">.*</tr><tr class="((B)|(C)|[^"]+)+">.*</tr>
preg_match_all
will look for your whole pattern multiple times.
As it's found only once (I assume because the start is in $playerHtml
only once), you only get one match.
Instead, first look for your whole pattern and extract the part you're interested in, then continue with that segment:
$pattern = '/<tr\ class=\"A\">.*(<tr\ class=\"(B|C)\">.*<\/tr>.*)+/';
$r = preg_match($pattern, $playerHtml, $matches);
if (FALSE === $r) throw new Exception('Regex failed.');
list(,$scoreHtml) = $matches;
$r = preg_match_all('/(<tr\ class=\"(B|C)\">.*<\/tr>.*)/', $scoreHtml, $scores);
if (FALSE === $r) throw new Exception('Regex failed.');
This code is quickly written and will most certainly not work, it's just for illustrating that you need to do multiple steps.
However, if you're using a HTML parser instead of regular expressions, I bet it's much more quickier to obtain the values you're after with some little xpath query:
//tr[@class="B" or @class="C"]
This selects all <tr>
elements with the classes you look for. Much easier.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.