简体   繁体   中英

Extract text between HTML tags (<tr> </tr>)

I'm trying to extract the text between all <tr> </tr> tags in a string and print them.

use strict;
use warnings;

my $HTML = '
<tr data_1="15,12,2016" data_2="1">
<td class="cl_1">11111</td>
<td class="cl_2">11111</td>
<td class="cl_3"><strong>11111</strong></td>
<td class="cl_4" colspan="3">11111</td>
</tr>
<tr data_1="16,12,2016" data_2="0">
<td class="cl_1">22222</td>
<td class="cl_2">22222</td>
<td class="cl_3"><strong>22222</strong></td>
<td class="cl_4" colspan="3">22222</td>
</tr>
<tr data_1="15,12,2016" data_2="1">
<td class="cl_1">33333</td>
<td class="cl_2">33333</td>
<td class="cl_3"><strong>33333</strong></td>
<td class="cl_4" colspan="3">33333</td>
</tr>
';

while($HTML =~ /data_2="1">(.*)<\/tr>(\R)/sg) {
    print "$1\n\n";
}

The output should be:

<td class="cl_1">11111</td>
<td class="cl_2">11111</td>
<td class="cl_3"><strong>11111</strong></td>
<td class="cl_4" colspan="3">11111</td>

<td class="cl_1">33333</td>
<td class="cl_2">33333</td>
<td class="cl_3"><strong>33333</strong></td>
<td class="cl_4" colspan="3">33333</td>

How do I do that and extract the content from each <tr> tag?

Editing answer to include new restriction on which <tr>'s are wanted

my $HTML = '
<tr data_1="15,12,2016" data_2="1">
<td class="cl_1">11111</td>
<td class="cl_2">11111</td>
<td class="cl_3"><strong>11111</strong></td>
<td class="cl_4" colspan="3">11111</td>
</tr>
<tr data_1="16,12,2016" data_2="0">
<td class="cl_1">22222</td>
<td class="cl_2">22222</td>
<td class="cl_3"><strong>22222</strong></td>
<td class="cl_4" colspan="3">22222</td>
</tr>
<tr data_1="15,12,2016" data_2="1">
<td class="cl_1">33333</td>
<td class="cl_2">33333</td>
<td class="cl_3"><strong>33333</strong></td>
<td class="cl_4" colspan="3">33333</td>
</tr>
';

while($HTML =~ /<tr[^>]*data_2=\"1\"[^>]*>(.*?)<\/tr>/msg) {
    print "$1\n\n"; }

Output:

<td class="cl_1">11111</td>
<td class="cl_2">11111</td>
<td class="cl_3"><strong>11111</strong></td>
<td class="cl_4" colspan="3">11111</td>

<td class="cl_1">33333</td>
<td class="cl_2">33333</td>
<td class="cl_3"><strong>33333</strong></td>
<td class="cl_4" colspan="3">33333</td>

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM