如何在HTML文件中讀取表的值並將它們存儲在Perl中？

Question

我讀了很多問題和許多答案，但我找不到我的問題的直接答案。 所有答案要么非常籠統，要么與我想做的不同。 到目前為止，我需要使用HTML :: TableExtract或HTML :: TreeBuilder :: XPath，但我無法真正使用它們來存儲值。 我可以以某種方式獲取表行值並使用Dumper顯示它們。

像這樣的東西：

foreach my $ts ($tree->table_states) {
 foreach my $row ($ts->rows) { 
   push (@fir , (Dumper $row)); 
} }
print @sec;

但這並不是我正在尋找的東西。 我將添加我想要存儲值的HTML表的結構：

<table><caption><b>Table 1 </b>bla bla bla</caption>
<tbody>
    <tr>
        <th ><p>Foo</p>
        </th>

        <td ><p>Bar</p>
        </td>

    </tr>

    <tr>
        <th ><p>Foo-1</p>
        </th>

        <td ><p>Bar-1</p>
        </td>

    </tr>

    <tr>
        <th ><p>Formula</p>
        </th>

        <td><p>Formula1-1</p>
            <p>Formula1-2</p>
            <p>Formula1-3</p>
            <p>Formula1-4</p>
            <p>Formula1-5</p>
        </td>

    </tr>

    <tr>
        <th><p>Foo-2</p>
        </th>

        <td ><p>Bar-2</p>
        </td>

    </tr>

    <tr>
        <th ><p>Foo-3</p>
        </th>

        <td ><p>Bar-3</p>
             <p>Bar-3-1</p>
        </td>

    </tr>

</tbody>

</table>

如果我可以將行值成對存儲在一起會很方便。

預期輸出將類似於具有以下值的數組：（Foo，Bar，Foo-1，Bar-1，Formula，Formula-1 Formula-2 Formula-3 Formula-4 Formula-5，....）重要對我來說，最重要的是學習如何存儲每個標簽的值以及如何在標簽樹中移動。

Answer 1

學習XPath和DOM操作。

use strictures;
use HTML::TreeBuilder::XPath qw();
my $dom = HTML::TreeBuilder::XPath->new;
$dom->parse_file('10280979.html');

my %extract;
@extract{$dom->findnodes_as_strings('//th')} =
    map {[$_->findvalues('p')]} $dom->findnodes('//td');
__END__
# %extract = (
#     Foo     => [qw(Bar)],
#     'Foo-1' => [qw(Bar-1)],
#     'Foo-2' => [qw(Bar-2)],
#     'Foo-3' => [qw(Bar-3 Bar-3-1)],
#     Formula => [qw(Formula1-1 Formula1-2 Formula1-3 Formula1-4 Formula1-5)],
# )

如何在HTML文件中讀取表的值並將它們存儲在Perl中？

問題描述

1 個解決方案

解決方案1
3 2012-04-23 14:45:13

如何在HTML文件中讀取表的值並將它們存儲在Perl中？

問題描述

1 個解決方案

解決方案1 3 2012-04-23 14:45:13

解決方案1
3 2012-04-23 14:45:13