[英]parsing the html with HTML::TreeBuilder
我想解析html頁面。使用以下命令提取badge,description和badge類型
<div class="row">
<div class="span8">
<table id="badge-list">
<tr>
<td style="width: 25px;"></td>
<td style="width: 200px;" class="badge-cell">
<a class="badge-name" href="/badge/show/3/">
<span class="badge-icon bronze">•</span>
Editor
</a>
<span class="multiplier">x 3892</span></td>
<td class="earned False"> </td>
<td>First edit</td>
</tr>
我的perl代碼如下,
我正在嘗試使用以下代碼提取a class="badge-name"
和其他詳細信息
my $tree = HTML::TreeBuilder->new();
$tree->parse($content);
my ($h1) = $tree->look_down('_tag', 'table', 'id', 'badge-list');
my @tr = $h1->look_down('_tag', 'tr') ;
foreach my $tr (@tr) {
my @tdList = $tr->look_down('_tag','td');
foreach my $td ( @tdList) {
if (my $a = $td->look_down('_tag','a')) {
print $a->as_text , "\n";
my $span = $a->look_down('_tag','span', 'class');
print $span->attr('class');
}
else {
my $text = $td->as_text , "\n";
print "$text\n";
}
}
}
該代碼Wide character in print at ..
引發警告Wide character in print at ..
look_down
需要成對的屬性/值參數。
$a->look_down('_tag','span', 'class')
應該只是
$a->look_down('_tag','span')
我建議添加“ use utf8;”。 在腳本的開頭添加支持非ASCII符號的打印件。 符號•表示寬。
use utf8;
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.