简体   繁体   中英

Regular Expression repetition of class

I am trying to figure out a regular expression for the following:

<tr class="A">.*</tr><tr class="(B|C)">.*</tr>

Now The second tr class will repeat an unknown number of times, with something unknown in between repetitions, but simply putting it in parentheses and added a plus doesn't work.

Here's the PHP code that didn't work:

$pattern = '/<tr\ class=\"A\">.*(<tr\ class=\"(B|C)\">.*<\/tr>.*)+/';
preg_match_all($pattern,$playerHtml,$scores);

But it only returns the first

Here's an example of something that should match:

<tr class="A">blah</tr>blah
<tr class="B">blah</tr>blah
<tr class="B">blah</tr>blah
<tr class="C">blah</tr>

This only matches blahblahblah

For your particular example, this regex will do:

/<tr class="A">.*?<\/tr>.*\n?(<tr class="[BC]">.*?<\/tr>.*\n?)+/

Hope you can tweak it if need be. See the codepad demo here .

I needed to include \\n newline characters for it to work.

Because they are TR elements outside of TABLE elements, I had a hard time seeing the result of the preg_match_all function (because my browser immediately stripped the random TR elements). You may have had similar problems. I used htmlspecialchars() in the demo to output the regex match.

Also , it's improper to have text between two TR elements:

<tr></tr>blah<tr></tr>

So you should be careful about doing that.

Try:

 <tr class="A">.*</tr><tr class="((B|C)\s*)+">.*</tr>

+ indicates one or more times and * indicates 0 or more times. Also \\s inidcates a white space.

((B|C)\\s*)+ means there will be one or more blocks of (B|C)\\s*

(B|C)\\s* means there will be a string starts with B or C then some whitespaces may be followed.

由于我在用手机,所以我无法测试它,但是使用这种模式,您在$ scores中得到什么?

<tr class="A">.*</tr><tr class="((B)|(C)|[^"]+)+">.*</tr>

preg_match_all will look for your whole pattern multiple times.

As it's found only once (I assume because the start is in $playerHtml only once), you only get one match.

Instead, first look for your whole pattern and extract the part you're interested in, then continue with that segment:

$pattern = '/<tr\ class=\"A\">.*(<tr\ class=\"(B|C)\">.*<\/tr>.*)+/';
$r = preg_match($pattern, $playerHtml, $matches);
if (FALSE === $r) throw new Exception('Regex failed.');

list(,$scoreHtml) = $matches;

$r = preg_match_all('/(<tr\ class=\"(B|C)\">.*<\/tr>.*)/', $scoreHtml, $scores);
if (FALSE === $r) throw new Exception('Regex failed.');

This code is quickly written and will most certainly not work, it's just for illustrating that you need to do multiple steps.

However, if you're using a HTML parser instead of regular expressions, I bet it's much more quickier to obtain the values you're after with some little xpath query:

//tr[@class="B" or @class="C"]

This selects all <tr> elements with the classes you look for. Much easier.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM